< draft-ietf-ipfix-anon-05.txt   draft-ietf-ipfix-anon-06.txt >
IPFIX Working Group E. Boschi IPFIX Working Group E. Boschi
Internet-Draft B. Trammell Internet-Draft B. Trammell
Intended status: Experimental ETH Zurich Intended status: Experimental ETH Zurich
Expires: April 23, 2011 October 20, 2010 Expires: July 23, 2011 January 19, 2011
IP Flow Anonymisation Support IP Flow Anonymization Support
draft-ietf-ipfix-anon-05.txt draft-ietf-ipfix-anon-06.txt
Abstract Abstract
This document describes anonymisation techniques for IP flow data and This document describes anonymization techniques for IP flow data and
the export of anonymised data using the IPFIX protocol. It the export of anonymized data using the IPFIX protocol. It
categorizes common anonymisation schemes and defines the parameters categorizes common anonymization schemes and defines the parameters
needed to describe them. It provides guidelines for the needed to describe them. It provides guidelines for the
implementation of anonymised data export and storage over IPFIX, and implementation of anonymized data export and storage over IPFIX, and
describes an information model and Options-based method for describes an information model and Options-based method for
anonymisation metadata export within the IPFIX protocol or storage in anonymization metadata export within the IPFIX protocol or storage in
IPFIX Files. IPFIX Files.
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 23, 2011. This Internet-Draft will expire on July 23, 2011.
Copyright Notice Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. IPFIX Protocol Overview . . . . . . . . . . . . . . . . . 4 1.1. IPFIX Protocol Overview . . . . . . . . . . . . . . . . . 4
1.2. IPFIX Documents Overview . . . . . . . . . . . . . . . . . 5 1.2. IPFIX Documents Overview . . . . . . . . . . . . . . . . . 5
1.3. Anonymisation within the IPFIX Architecture . . . . . . . 5 1.3. Anonymization within the IPFIX Architecture . . . . . . . 5
1.4. Supporting Experimentation with Anonymization . . . . . . 6
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6
3. Categorisation of Anonymisation Techniques . . . . . . . . . . 6 3. Categorization of Anonymization Techniques . . . . . . . . . . 7
4. Anonymisation of IP Flow Data . . . . . . . . . . . . . . . . 8 4. Anonymization of IP Flow Data . . . . . . . . . . . . . . . . 8
4.1. IP Address Anonymisation . . . . . . . . . . . . . . . . . 9 4.1. IP Address Anonymization . . . . . . . . . . . . . . . . . 10
4.1.1. Truncation . . . . . . . . . . . . . . . . . . . . . . 9 4.1.1. Truncation . . . . . . . . . . . . . . . . . . . . . . 11
4.1.2. Reverse Truncation . . . . . . . . . . . . . . . . . . 10 4.1.2. Reverse Truncation . . . . . . . . . . . . . . . . . . 11
4.1.3. Permutation . . . . . . . . . . . . . . . . . . . . . 10 4.1.3. Permutation . . . . . . . . . . . . . . . . . . . . . 11
4.1.4. Prefix-preserving Pseudonymisation . . . . . . . . . . 11 4.1.4. Prefix-preserving Pseudonymization . . . . . . . . . . 12
4.2. MAC Address Anonymisation . . . . . . . . . . . . . . . . 11 4.2. MAC Address Anonymization . . . . . . . . . . . . . . . . 12
4.2.1. Truncation . . . . . . . . . . . . . . . . . . . . . . 12 4.2.1. Truncation . . . . . . . . . . . . . . . . . . . . . . 13
4.2.2. Reverse Truncation . . . . . . . . . . . . . . . . . . 12 4.2.2. Reverse Truncation . . . . . . . . . . . . . . . . . . 13
4.2.3. Permutation . . . . . . . . . . . . . . . . . . . . . 13 4.2.3. Permutation . . . . . . . . . . . . . . . . . . . . . 14
4.2.4. Structured Pseudonymisation . . . . . . . . . . . . . 13 4.2.4. Structured Pseudonymization . . . . . . . . . . . . . 14
4.3. Timestamp Anonymisation . . . . . . . . . . . . . . . . . 13 4.3. Timestamp Anonymization . . . . . . . . . . . . . . . . . 15
4.3.1. Precision Degradation . . . . . . . . . . . . . . . . 14 4.3.1. Precision Degradation . . . . . . . . . . . . . . . . 15
4.3.2. Enumeration . . . . . . . . . . . . . . . . . . . . . 14 4.3.2. Enumeration . . . . . . . . . . . . . . . . . . . . . 16
4.3.3. Random Shifts . . . . . . . . . . . . . . . . . . . . 15 4.3.3. Random Shifts . . . . . . . . . . . . . . . . . . . . 16
4.4. Counter Anonymisation . . . . . . . . . . . . . . . . . . 15 4.4. Counter Anonymization . . . . . . . . . . . . . . . . . . 16
4.4.1. Precision Degradation . . . . . . . . . . . . . . . . 15 4.4.1. Precision Degradation . . . . . . . . . . . . . . . . 17
4.4.2. Binning . . . . . . . . . . . . . . . . . . . . . . . 15 4.4.2. Binning . . . . . . . . . . . . . . . . . . . . . . . 17
4.4.3. Random Noise Addition . . . . . . . . . . . . . . . . 16 4.4.3. Random Noise Addition . . . . . . . . . . . . . . . . 17
4.5. Anonymisation of Other Flow Fields . . . . . . . . . . . . 16 4.5. Anonymization of Other Flow Fields . . . . . . . . . . . . 17
4.5.1. Binning . . . . . . . . . . . . . . . . . . . . . . . 16 4.5.1. Binning . . . . . . . . . . . . . . . . . . . . . . . 18
4.5.2. Permutation . . . . . . . . . . . . . . . . . . . . . 17 4.5.2. Permutation . . . . . . . . . . . . . . . . . . . . . 18
5. Parameters for the Description of Anonymisation Techniques . . 17 5. Parameters for the Description of Anonymization Techniques . . 18
5.1. Stability . . . . . . . . . . . . . . . . . . . . . . . . 17 5.1. Stability . . . . . . . . . . . . . . . . . . . . . . . . 19
5.2. Truncation Length . . . . . . . . . . . . . . . . . . . . 18 5.2. Truncation Length . . . . . . . . . . . . . . . . . . . . 19
5.3. Bin Map . . . . . . . . . . . . . . . . . . . . . . . . . 18 5.3. Bin Map . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.4. Permutation . . . . . . . . . . . . . . . . . . . . . . . 18 5.4. Permutation . . . . . . . . . . . . . . . . . . . . . . . 20
5.5. Shift Amount . . . . . . . . . . . . . . . . . . . . . . . 19 5.5. Shift Amount . . . . . . . . . . . . . . . . . . . . . . . 20
6. Anonymisation Export Support in IPFIX . . . . . . . . . . . . 19 6. Anonymization Export Support in IPFIX . . . . . . . . . . . . 20
6.1. Anonymisation Records and the Anonymisation Options 6.1. Anonymization Records and the Anonymization Options
Template . . . . . . . . . . . . . . . . . . . . . . . . . 19 Template . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.2. Recommended Information Elements for Anonymisation 6.2. Recommended Information Elements for Anonymization
Metadata . . . . . . . . . . . . . . . . . . . . . . . . . 21 Metadata . . . . . . . . . . . . . . . . . . . . . . . . . 23
6.2.1. informationElementIndex . . . . . . . . . . . . . . . 21 6.2.1. informationElementIndex . . . . . . . . . . . . . . . 23
6.2.2. anonymisationTechnique . . . . . . . . . . . . . . . . 22 6.2.2. anonymizationTechnique . . . . . . . . . . . . . . . . 23
6.2.3. anonymisationFlags . . . . . . . . . . . . . . . . . . 23 6.2.3. anonymizationFlags . . . . . . . . . . . . . . . . . . 25
7. Applying Anonymisation Techniques to IPFIX Export and 7. Applying Anonymization Techniques to IPFIX Export and
Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
7.1. Arrangement of Processes in IPFIX Anonymisation . . . . . 26 7.1. Arrangement of Processes in IPFIX Anonymization . . . . . 28
7.2. IPFIX-Specific Anonymisation Guidelines . . . . . . . . . 28 7.2. IPFIX-Specific Anonymization Guidelines . . . . . . . . . 30
7.2.1. Appropriate Use of Information Elements for 7.2.1. Appropriate Use of Information Elements for
Anonymised Data . . . . . . . . . . . . . . . . . . . 28 Anonymized Data . . . . . . . . . . . . . . . . . . . 30
7.2.2. Export of Perimeter-Based Anonymisation Policies . . . 29 7.2.2. Export of Perimeter-Based Anonymization Policies . . . 31
7.2.3. Anonymisation of Header Data . . . . . . . . . . . . . 30 7.2.3. Anonymization of Header Data . . . . . . . . . . . . . 32
7.2.4. Anonymisation of Options Data . . . . . . . . . . . . 30 7.2.4. Anonymization of Options Data . . . . . . . . . . . . 32
7.2.5. Special-Use Address Space Considerations . . . . . . . 32 7.2.5. Special-Use Address Space Considerations . . . . . . . 34
7.2.6. Protecting Out-of-Band Configuration and 7.2.6. Protecting Out-of-Band Configuration and
Management Data . . . . . . . . . . . . . . . . . . . 32 Management Data . . . . . . . . . . . . . . . . . . . 34
8. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 8. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
9. Security Considerations . . . . . . . . . . . . . . . . . . . 37 9. Security Considerations . . . . . . . . . . . . . . . . . . . 39
10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 38 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 41
11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 39 11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 41
12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 39 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 41
12.1. Normative References . . . . . . . . . . . . . . . . . . . 39 12.1. Normative References . . . . . . . . . . . . . . . . . . . 41
12.2. Informative References . . . . . . . . . . . . . . . . . . 40 12.2. Informative References . . . . . . . . . . . . . . . . . . 42
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 40 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 43
1. Introduction 1. Introduction
The standardisation of an IP flow information export protocol The standardization of an IP flow information export protocol
[RFC5101] and associated representations removes a technical barrier [RFC5101] and associated representations removes a technical barrier
to the sharing of IP flow data across organizational boundaries and to the sharing of IP flow data across organizational boundaries and
with network operations, security, and research communities for a with network operations, security, and research communities for a
wide variety of purposes. However, with wider dissemination comes wide variety of purposes. However, with wider dissemination comes
greater risks to the privacy of the users of networks under greater risks to the privacy of the users of networks under
measurement, and to the security of those networks. While it is not measurement, and to the security of those networks. While it is not
a complete solution to the issues posed by distribution of IP flow a complete solution to the issues posed by distribution of IP flow
information, anonymisation (i.e., the deletion or transformation of information, anonymization (i.e., the deletion or transformation of
information that is considered sensitive and could be used to reveal information that is considered sensitive and could be used to reveal
the identity of subjects involved in a communication) is an important the identity of subjects involved in a communication) is an important
tool for the protection of privacy within network measurement tool for the protection of privacy within network measurement
infrastructures. infrastructures.
This document presents a mechanism for representing anonymised data This document presents a mechanism for representing anonymized data
within IPFIX and guidelines for using it. It begins with a within IPFIX and guidelines for using it. It is not intended as a
categorization of anonymisation techniques. It then describes general statement on the applicability of specific flow data
applicability of each technique to commonly anonymisable fields of IP anonymization techniques to specific situations, or as a
flow data, organized by information element data type and semantics recommendation of any particular application of anonymization to flow
as in [RFC5102]; enumerates the parameters required by each of the data export. Exporters or publishers of anonymized data must take
applicable anonymisation techniques; and provides guidelines for the care that the applied anonymization technique is appropriate for the
use of each of these techniques in accordance with best practices in data source, the purpose, and the risk of deanonymization of a given
data protection. Finally, it specifies a mechanism for exporting application.
anonymised data and binding anonymisation metadata to Templates and
Options Templates using IPFIX Options. It begins with a categorization of anonymization techniques. It then
describes applicability of each technique to commonly anonymizable
fields of IP flow data, organized by information element data type
and semantics as in [RFC5102]; enumerates the parameters required by
each of the applicable anonymization techniques; and provides
guidelines for the use of each of these techniques in accordance with
current best practices in data protection. Finally, it specifies a
mechanism for exporting anonymized data and binding anonymization
metadata to Templates and Options Templates using IPFIX Options.
1.1. IPFIX Protocol Overview 1.1. IPFIX Protocol Overview
In the IPFIX protocol, { type, length, value } tuples are expressed In the IPFIX protocol, { type, length, value } tuples are expressed
in Templates containing { type, length } pairs, specifying which { in Templates containing { type, length } pairs, specifying which {
value } fields are present in data records conforming to the value } fields are present in data records conforming to the
Template, giving great flexibility as to what data is transmitted. Template, giving great flexibility as to what data is transmitted.
Since Templates are sent very infrequently compared with Data Since Templates are sent very infrequently compared with Data
Records, this results in significant bandwidth savings. Various Records, this results in significant bandwidth savings. Various
different data formats may be transmitted simply by sending new different data formats may be transmitted simply by sending new
skipping to change at page 5, line 36 skipping to change at page 5, line 43
applications of the IPFIX protocol and their use of information applications of the IPFIX protocol and their use of information
exported via IPFIX, and relates the IPFIX architecture to other exported via IPFIX, and relates the IPFIX architecture to other
measurement architectures and frameworks. measurement architectures and frameworks.
Additionally, "Specification of the IPFIX File Format" [RFC5655] Additionally, "Specification of the IPFIX File Format" [RFC5655]
describes a file format based upon the IPFIX Protocol for the storage describes a file format based upon the IPFIX Protocol for the storage
of flow data. of flow data.
This document references the Protocol and Architecture documents for This document references the Protocol and Architecture documents for
terminology, and extends the IPFIX Information Model to provide new terminology, and extends the IPFIX Information Model to provide new
Information Elements for anonymisation metadata. The anonymisation Information Elements for anonymization metadata. The anonymization
techniques described herein are equally applicable to the IPFIX techniques described herein are equally applicable to the IPFIX
Protocol and data stored in IPFIX Files. Protocol and data stored in IPFIX Files.
1.3. Anonymisation within the IPFIX Architecture 1.3. Anonymization within the IPFIX Architecture
According to [RFC5470], IPFIX Message anonymisation is optionally According to [RFC5470], IPFIX Message anonymization is optionally
performed as the final operation before handing the Message to the performed as the final operation before handing the Message to the
transport protocol for export. While no provision is made in the transport protocol for export. While no provision is made in the
architecture for anonymisation metadata as in Section 6, this architecture for anonymization metadata as in Section 6, this
arrangement does allow for the rewriting necessary for comprehensive arrangement does allow for the rewriting necessary for comprehensive
anonymisation of IPFIX export as in Section 7. The development of anonymization of IPFIX export as in Section 7. The development of
the IPFIX Mediation [I-D.ietf-ipfix-mediators-framework] framework the IPFIX Mediation [I-D.ietf-ipfix-mediators-framework] framework
and the IPFIX File Format [RFC5655] expand upon this initial and the IPFIX File Format [RFC5655] expand upon this initial
architectural allowance for anonymisation by adding to the list of architectural allowance for anonymization by adding to the list of
places that anonymisation may be applied. The former specifies IPFIX places that anonymization may be applied. The former specifies IPFIX
Mediators, which rewrite existing IPFIX Messages, and the latter Mediators, which rewrite existing IPFIX Messages, and the latter
specifies a method for storage of IPFIX data in files. specifies a method for storage of IPFIX data in files.
More detail on the applicable architectural arrangements for More detail on the applicable architectural arrangements for
anonymisation can be found in Section 7.1 anonymization can be found in Section 7.1
1.4. Supporting Experimentation with Anonymization
The intended status of this document is Experimental, reflecting the
experimental nature of anonymization export support. Research on
network trace anonymization techniques and attacks against them is
ongoing. Indeed, there is increasing evidence that anonymization
applied to network trace or flow data its own is insufficient for
many data protection applications as in [Bur10]. Therefore, this
document explicitly does not recommend any particular technique or
implementation thereof.
The intention of this document is to provide a common basis for
interoperable exchange of anonymized data, furthering research in
this area, both on anonymization techniques themselves as well as to
the application of anonymized data to network measurement. To that
end, the classification in Section 3 and anonymization export support
in Section 6 can be used to describe and export information even
about data anonymized using techniques that are unacceptably weak for
general application to production data sets on their own.
While the specification herein is designed to be implementation- and
technique-independent, open research in this area may necessitate
future updates to the specification. Assuming the future successful
application of this specification to anonymized data publication and
exchange, it may be brought back to the IPFIX working group for
further development and publication on the standards track.
2. Terminology 2. Terminology
Terms used in this document that are defined in the Terminology Terms used in this document that are defined in the Terminology
section of the IPFIX Protocol [RFC5101] document are to be section of the IPFIX Protocol [RFC5101] document are to be
interpreted as defined there. In addition, this document defines the interpreted as defined there. In addition, this document defines the
following terms: following terms:
Anonymisation Record: A record, defined by the Anonymisation Anonymization Record: A record, defined by the Anonymization
Options Template in section Section 6.1, that defines the Options Template in section Section 6.1, that defines the
properties of the anonymisation applied to a single Information properties of the anonymization applied to a single Information
Element within a single Template or Options Template. Element within a single Template or Options Template.
Anonymised Data Record: A Data Record within a Data Set containing Anonymized Data Record: A Data Record within a Data Set containing
at least one Information Element with anonymised values. The at least one Information Element with anonymized values. The
Information Element(s) within the Template or Options Template Information Element(s) within the Template or Options Template
describing this Data Record SHOULD have a corresponding describing this Data Record SHOULD have a corresponding
Anonymisation Record. Anonymization Record.
Intermediate Anonymisation Process: An intermediate process which Intermediate Anonymization Process: An intermediate process which
takes Data Records and and transforms them into Anonymised Data takes Data Records and and transforms them into Anonymized Data
Records. Records.
Note that there is an explicit difference in this document between a Note that there is an explicit difference in this document between a
"Data Set" (which is defined as in [RFC5101]) and a "data set". When "Data Set" (which is defined as in [RFC5101]) and a "data set". When
in lower case, this term refers to any collection of data (usually, in lower case, this term refers to any collection of data (usually,
within the context of this document, flow or packet data) which may within the context of this document, flow or packet data) which may
contain identifying information and is therefore subject to contain identifying information and is therefore subject to
anonymisation. anonymization.
Note also that when the term Template is used in this document, Note also that when the term Template is used in this document,
unless otherwise noted, it applies both to Templates and Options unless otherwise noted, it applies both to Templates and Options
Templates as defined in [RFC5101]. Specifically, Anonymisation Templates as defined in [RFC5101]. Specifically, Anonymization
Records may apply to both Templates and Options Templates. Records may apply to both Templates and Options Templates.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119]. document are to be interpreted as described in RFC 2119 [RFC2119].
3. Categorisation of Anonymisation Techniques 3. Categorization of Anonymization Techniques
Anonymisation modifies a data set in order to protect the identity of Anonymization, as described by this document, is the modification of
the people or entities described by the data set from disclosure. a data set in order to protect the identity of the people or entities
With respect to network traffic data, anonymisation generally described by the data set from disclosure. With respect to network
attempts to preserve some set of properties of the network traffic traffic data, anonymization generally attempts to preserve some set
useful for a given application or applications, while ensuring the of properties of the network traffic useful for a given application
data cannot be traced back to the specific networks, hosts, or users or applications, while ensuring the data cannot be traced back to the
generating the traffic. specific networks, hosts, or users generating the traffic.
Anonymisation may be broadly classified according to two properties: Anonymization may be broadly classified according to two properties:
recoverability and countability. All anonymisation techniques map recoverability and countability. All anonymization techniques map
the real space of identifiers or values into a separate, anonymised the real space of identifiers or values into a separate, anonymized
space, according to some function. A technique is said to be space, according to some function. A technique is said to be
recoverable when the function used is invertible or can otherwise be recoverable when the function used is invertible or can otherwise be
reversed and a real identifier can be recovered from a given reversed and a real identifier can be recovered from a given
replacement identifier. replacement identifier. Techniques wherein the function used can
only be reversed using additional information, such as an encryption
key, or knowledge of injected traffic within the data set;
"recoverability" as used within this categorization does not refer to
recoverability under attack.
Countability compares the dimension of the anonymised space (N) to Countability compares the dimension of the anonymized space (N) to
the dimension of the real space (M), and denotes how the count of the dimension of the real space (M), and denotes how the count of
unique values is preserved by the anonymisation function. If the unique values is preserved by the anonymization function. If the
anonymised space is smaller than the real space, then the function is anonymized space is smaller than the real space, then the function is
said to generalise the input, mapping more than one input point to said to generalize the input, mapping more than one input point to
each anonymous value (e.g., as with aggregation). By definition, each anonymous value (e.g., as with aggregation). By definition,
generalisation is not recoverable. generalization is not recoverable.
If the dimensions of the anonymised and real spaces are the same, If the dimensions of the anonymized and real spaces are the same,
such that the count of unique values is preserved, then the function such that the count of unique values is preserved, then the function
is said to be a direct substitution function. If the dimension of is said to be a direct substitution function. If the dimension of
the anonymised space is larger, such that each real value maps to a the anonymized space is larger, such that each real value maps to a
set of anonymised values, then the function is said to be a set set of anonymized values, then the function is said to be a set
substitution function. Note that with set substitution functions, substitution function. Note that with set substitution functions,
the sets of anonymised values are not necessarily disjoint. Either the sets of anonymized values are not necessarily disjoint. Either
direct or set substitution functions are said to be one-way if there direct or set substitution functions are said to be one-way if there
exists no non-brute force method for recovering the real data point exists no non-brute force method for recovering the real data point
from an anonymised one in isolation (i.e., if the only way to recover from an anonymized one in isolation (i.e., if the only way to recover
the data point is to attack the anonymised data set as a whole, e.g. the data point is to attack the anonymized data set as a whole, e.g.
through fingerprinting or data injection). through fingerprinting or data injection).
This classification is summarised in the table below. This classification is summarized in the table below.
+------------------------+-----------------+------------------------+ +------------------------+-----------------+------------------------+
| Recoverability / | Recoverable | Non-recoverable | | Recoverability / | Recoverable | Non-recoverable |
| Countability | | | | Countability | | |
+------------------------+-----------------+------------------------+ +------------------------+-----------------+------------------------+
| N < M | N.A. | Generalisation | | N < M | N.A. | Generalization |
| N = M | Direct | One-way Direct | | N = M | Direct | One-way Direct |
| | Substitution | Substitution | | | Substitution | Substitution |
| N > M | Set | One-way Set | | N > M | Set | One-way Set |
| | Substitution | Substitution | | | Substitution | Substitution |
+------------------------+-----------------+------------------------+ +------------------------+-----------------+------------------------+
4. Anonymisation of IP Flow Data 4. Anonymization of IP Flow Data
Due to the restricted semantics of IP flow data, there is a In anonymizing IP flow data as treated by this document, the goal is
relatively limited set of specific anonymisation techniques available generally two-way address untraceability: to remove the ability to
on flow data, though each falls into the broad categories above. assert that endpoint X contacted endpoint Y at time T. Address
Each type of field that may commonly appear in a flow record may have untraceability is important as IP addresses are the most suitable
its own applicable specific techniques. field in IP flow records to identify real-world entities. Each IP
address is associated with an interface on a network host, and can
potentially be identified with a single user. Additionally, IP
addresses are structured identifiers; that is, partial IP address
prefixes may be used to identify networks just as full IP addresses
identify hosts. This leads IP flow data anonymization to be
concerned first and foremost with IP address anonymization.
While anonymisation is generally applied at the resolution of single Any form of aggregation which combines flows from multiple endpoints
fields within a flow record, attacks against anonymisation use entire into a single record (e.g., aggregation by subnetwork, aggregation
removing addressing completely) may also provide address
untraceability; however, anonymization by aggregation is out of scope
for this document. Additionally of potential interest in this
problem space but out of scope are anonymization techniques which are
applied over multiple fields or multiple records in a way which
introduces dependencies among anonymized fields or records. This
document is concerned solely with anonymization techniques applied at
the resolution of single fields within a flow record.
Even so, attacks against these anonymization techniques use entire
flows and relationships between hosts and flows within a given data flows and relationships between hosts and flows within a given data
set. Therefore, fields which may not necessarily be identifying by set. Therefore, fields which may not necessarily be identifying by
themselves may be anonymised in order to increase the anonymity of themselves may be anonymized in order to increase the anonymity of
the data set as a whole. the data set as a whole.
Of all the fields in an IP flow record, IP addresses are the most Due to the restricted semantics of IP flow data, there is a
likely to be used to directly identify entities in the real world. relatively limited set of specific anonymization techniques available
Each IP address is associated with an interface on a network host, on flow data, though each falls into the broad categories discussed
and can potentially be identified with a single user. Additionally, in the previous section. Each type of field that may commonly appear
IP addresses are structured identifiers; that is, partial IP address in a flow record may have its own applicable specific techniques.
prefixes may be used to identify networks just as full IP addresses
identify hosts. This makes anonymisation of IP addresses
particularly important.
MAC addresses uniquely identify devices on the network; while they As with IP addresses, MAC addresses uniquely identify devices on the
are not often available in traffic data collected at Layer 3, and network; while they are not often available in traffic data collected
cannot be used to locate devices within the network, some traces may at Layer 3, and cannot be used to locate devices within the network,
contain sub-IP data including MAC address data. Hardware addresses some traces may contain sub-IP data including MAC address data.
may be mappable to device serial numbers, and to the entities or Hardware addresses may be mappable to device serial numbers, and to
individuals who purchased the devices, when combined with external the entities or individuals who purchased the devices, when combined
databases. MAC addresses are also often used in constructing IPv6 with external databases. MAC addresses are also often used in
addresses (see section 2.5.1 of [RFC4291]), and as such may be used constructing IPv6 addresses (see section 2.5.1 of [RFC4291]), and as
to reconstruct the low-order bits of anonymised IPv6 addresses in such may be used to reconstruct the low-order bits of anonymized IPv6
certain circumstances. Therefore, MAC address anonymisation is also addresses in certain circumstances. Therefore, MAC address
important. anonymization is also important.
Port numbers identify abstract entities (applications) as opposed to Port numbers identify abstract entities (applications) as opposed to
real-world entities, but they can be used to classify hosts and user real-world entities, but they can be used to classify hosts and user
behavior. Passive port fingerprinting, both of well-known and behavior. Passive port fingerprinting, both of well-known and
ephemeral ports, can be used to determine the operating system ephemeral ports, can be used to determine the operating system
running on a host. Relative data volumes by port can also be used to running on a host. Relative data volumes by port can also be used to
determine the host's function (workstation, web server, etc.); this determine the host's function (workstation, web server, etc.); this
information can be used to identify hosts and users. information can be used to identify hosts and users.
While not identifiers in and of themselves, timestamps and counters While not identifiers in and of themselves, timestamps and counters
can reveal the behavior of the hosts and users on a network. Any can reveal the behavior of the hosts and users on a network. Any
given network activity is recognizable by a pattern of relative time given network activity is recognizable by a pattern of relative time
differences and data volumes in the associated sequence of flows, differences and data volumes in the associated sequence of flows,
even without host address information. They can therefore be used to even without host address information. They can therefore be used to
identify hosts and users. Timestamps and counters are also identify hosts and users. Timestamps and counters are also
vulnerable to traffic injection attacks, where traffic with a known vulnerable to traffic injection attacks, where traffic with a known
pattern is injected into a network under measurement, and this pattern is injected into a network under measurement, and this
pattern is later identified in the anonymised data set. pattern is later identified in the anonymized data set.
The simplest and most extreme form of anonymisation, which can be The simplest and most extreme form of anonymization, which can be
applied to any field of a flow record, is black-marker anonymisation, applied to any field of a flow record, is black-marker anonymization,
or complete deletion of a given field. Note that black-marker or complete deletion of a given field. Note that black-marker
anonymisation is equivalent to simply not exporting the field(s) in anonymization is equivalent to simply not exporting the field(s) in
question. question.
While black-marker anonymisation completely protects the data in the While black-marker anonymization completely protects the data in the
deleted fields from the risk of disclosure, it also reduces the deleted fields from the risk of disclosure, it also reduces the
utility of the anonymised data set as a whole. Techniques that utility of the anonymized data set as a whole. Techniques that
retain some information while reducing (though not eliminating) the retain some information while reducing (though not eliminating) the
disclosure risk will be extensively discussed in the following disclosure risk will be extensively discussed in the following
sections; note that the techniques specifically applicable to IP sections; note that the techniques specifically applicable to IP
addresses, timestamps, ports, and counters will be discussed in addresses, timestamps, ports, and counters will be discussed in
separate sections. separate sections.
4.1. IP Address Anonymisation 4.1. IP Address Anonymization
Since IP addresses are the most common identifiers within flow data Since IP addresses are the most common identifiers within flow data
that can be used to directly identify a person, organization, or that can be used to directly identify a person, organization, or
host, most of the work on flow and trace data anonymisation has gone host, most of the work on flow and trace data anonymization has gone
into IP address anonymisation techniques. Indeed, the aim of most into IP address anonymization techniques. Indeed, the aim of most
attacks against anonymisation is to recover the map from anonymised attacks against anonymization is to recover the map from anonymized
IP addresses to original IP addresses thereby identifying the IP addresses to original IP addresses thereby identifying the
identified hosts. There is therefore a wide range of IP address identified hosts. There is therefore a wide range of IP address
anonymisation schemes that fit into the following categories. anonymization schemes that fit into the following categories.
+------------------------------------+---------------------+ +------------------------------------+---------------------+
| Scheme | Action | | Scheme | Action |
+------------------------------------+---------------------+ +------------------------------------+---------------------+
| Truncation | Generalisation | | Truncation | Generalization |
| Reverse Truncation | Generalisation | | Reverse Truncation | Generalization |
| Permutation | Direct Substitution | | Permutation | Direct Substitution |
| Prefix-preserving Pseudonymisation | Direct Substitution | | Prefix-preserving Pseudonymization | Direct Substitution |
+------------------------------------+---------------------+ +------------------------------------+---------------------+
4.1.1. Truncation 4.1.1. Truncation
Truncation removes "n" of the least significant bits from an IP Truncation removes "n" of the least significant bits from an IP
address, replacing them with zeroes. In effect, it replaces a host address, replacing them with zeroes. In effect, it replaces a host
address with a network address for some fixed netblock; for IPv4 address with a network address for some fixed netblock; for IPv4
addresses, 8-bit truncation corresponds to replacement with a /24 addresses, 8-bit truncation corresponds to replacement with a /24
network address. Truncation is a non-reversible generalisation network address. Truncation is a non-reversible generalization
scheme. Note that while truncation is effective for making hosts scheme. Note that while truncation is effective for making hosts
non-identifiable, it preserves information which can be used to non-identifiable, it preserves information which can be used to
identify an organization, a geographic region, a country, or a identify an organization, a geographic region, a country, or a
continent. continent.
Truncation to an address length of 0 is equivalent to black-marker Truncation to an address length of 0 is equivalent to black-marker
anonymisation. Complete removal of IP address information is only anonymization. Complete removal of IP address information is only
recommended for analysis tasks which have no need to separate flow recommended for analysis tasks which have no need to separate flow
data by host or network; e.g. as a first stage to per-application data by host or network; e.g. as a first stage to per-application
(port) or time-series total volume analyses. (port) or time-series total volume analyses.
4.1.2. Reverse Truncation 4.1.2. Reverse Truncation
Reverse truncation removes "n" of the most significant bits from an Reverse truncation removes "n" of the most significant bits from an
IP address, replacing them with zeroes. Reverse truncation is a non- IP address, replacing them with zeroes. Reverse truncation is a non-
reversible generalisation scheme. Reverse truncation is effective reversible generalization scheme. Reverse truncation is effective
for making networks unidentifiable, partially or completely removing for making networks unidentifiable, partially or completely removing
information which can be used to identify an organization, a information which can be used to identify an organization, a
geographic region, a country, or a continent (or RIR region of geographic region, a country, or a continent (or RIR region of
responsibility). However, it may cause ambiguity when applied to responsibility). However, it may cause ambiguity when applied to
data collected from more than one network, since it treats all the data collected from more than one network, since it treats all the
hosts with the same address on different networks as if they are the hosts with the same address on different networks as if they are the
same host. It is not particularly useful when publishing data where same host. It is not particularly useful when publishing data where
the network of origin is known or can be easily guessed by virtue of the network of origin is known or can be easily guessed by virtue of
the identity of the publisher. the identity of the publisher.
Like truncation, reverse truncation to an address length of 0 is Like truncation, reverse truncation to an address length of 0 is
equivalent to black-marker anonymisation. equivalent to black-marker anonymization.
4.1.3. Permutation 4.1.3. Permutation
Permutation is a direct substitution technique, replacing each IP Permutation is a direct substitution technique, replacing each IP
address with an address selected from the set of possible IP address with an address selected from the set of possible IP
addresses, such that each anonymised address represents a unique addresses, such that each anonymized address represents a unique
original address. The selection function is often random, though it original address. The selection function is often random, though it
is not necessarily so. Permutation does not preserve any structural is not necessarily so. Permutation does not preserve any structural
information about a network, but it does preserve the unique count of information about a network, but it does preserve the unique count of
IP addresses. Any application that requires more structure than IP addresses. Any application that requires more structure than
host-uniqueness will not be able to use permuted IP addresses. host-uniqueness will not be able to use permuted IP addresses.
While permutation ideally guarantees that each anonymised address There are many variations of permutation functions, each of which has
represents a unique original address, such requires significant state tradeoffs in performance, security, and guarantees of non-collision;
in the Intermediate Anonymisation Process. Therefore, permutation evaluating these tradeoffs is implementation independent. However,
may be implemented by hashing for performance reasons, with hash in general permutation functions applied to anonymization SHOULD be
functions that may have relatively small collision probabilities. difficult to reverse without knowing the parameters (e.g., a secret
Such techniques are still essentially direct substitution techniques, key for HMAC). Given the relatively small space of IPv4 addresses in
despite the nonzero error probability. particular, hash functions applied without additional parameters
could be reversed through brute force if the hash function is known,
and SHOULD NOT be used as permutation functions. Permutation
functions may guarantee noncollision (i.e., that each anonymized
address represents a unique original address), but need not; however,
the probability of collision SHOULD be low. We treat even
permutations with low but nonzero collision probability as direct
substitution nevertheless. Beyond these guidelines, recommendations
for specific permutation functions are out of scope for this
document.
4.1.4. Prefix-preserving Pseudonymisation 4.1.4. Prefix-preserving Pseudonymization
Prefix-preserving pseudonymisation is a direct substitution Prefix-preserving pseudonymization is a direct substitution
technique, like permutation but further restricted such that the technique, like permutation but further restricted such that the
structure of subnets is preserved at each level while anonymising IP structure of subnets is preserved at each level while anonymising IP
addresses. If two real IP addresses match on a prefix of "n" bits, addresses. If two real IP addresses match on a prefix of "n" bits,
the two anonymised IP addresses will match on a prefix of "n" bits as the two anonymized IP addresses will match on a prefix of "n" bits as
well. This is useful when relationships among networks must be well. This is useful when relationships among networks must be
preserved for a given analysis task, but introduces structure into preserved for a given analysis task, but introduces structure into
the anonymised data which can be exploited in attacks against the the anonymized data which can be exploited in attacks against the
anonymisation technique. anonymization technique.
Scanning in Internet background traffic can cause particular problems Scanning in Internet background traffic can cause particular problems
with this technique: if a scanner uses a predictable and known with this technique: if a scanner uses a predictable and known
sequence of addresses, this information can be used to reverse the sequence of addresses, this information can be used to reverse the
substitution. The low order portion of the address can be left substitution. The low order portion of the address can be left
unanonymized as a partial defense against this attack. unanonymized as a partial defense against this attack.
4.2. MAC Address Anonymisation 4.2. MAC Address Anonymization
Flow data containing sub-IP information can also contain identifying Flow data containing sub-IP information can also contain identifying
information in the form of the hardware (MAC) address. While MAC information in the form of the hardware (MAC) address. While MAC
address information cannot be used to locate a node within a network, address information cannot be used to locate a node within a network,
it can be used to directly uniquely identify a specific device. it can be used to directly uniquely identify a specific device.
Vendors or organizations within the supply chain may then have the Vendors or organizations within the supply chain may then have the
information necessary to identify the entity or individual that information necessary to identify the entity or individual that
purchased the device. purchased the device.
MAC address information is not as structured as IP address MAC address information is not as structured as IP address
skipping to change at page 12, line 8 skipping to change at page 13, line 18
Note that MAC address information also appear within IPv6 addresses, Note that MAC address information also appear within IPv6 addresses,
as the EAP-64 address, or EAP-48 address encoded as an EAP-64 as the EAP-64 address, or EAP-48 address encoded as an EAP-64
address, is used as the least significant 64 bits of the IPv6 address address, is used as the least significant 64 bits of the IPv6 address
in the case of link local addressing or stateless autoconfiguration; in the case of link local addressing or stateless autoconfiguration;
the considerations and techniques in this section may then apply to the considerations and techniques in this section may then apply to
such IPv6 addresses as well. such IPv6 addresses as well.
+-----------------------------+---------------------+ +-----------------------------+---------------------+
| Scheme | Action | | Scheme | Action |
+-----------------------------+---------------------+ +-----------------------------+---------------------+
| Truncation | Generalisation | | Truncation | Generalization |
| Reverse Truncation | Generalisation | | Reverse Truncation | Generalization |
| Permutation | Direct Substitution | | Permutation | Direct Substitution |
| Structured Pseudonymisation | Direct Substitution | | Structured Pseudonymization | Direct Substitution |
+-----------------------------+---------------------+ +-----------------------------+---------------------+
4.2.1. Truncation 4.2.1. Truncation
Truncation removes "n" of the least significant bits from a MAC Truncation removes "n" of the least significant bits from a MAC
address, replacing them with zeroes. In effect, it retains bits of address, replacing them with zeroes. In effect, it retains bits of
OUI, which identifies the manufacturer, while removing the least OUI, which identifies the manufacturer, while removing the least
significant bits identifying the particular device. Truncation of 24 significant bits identifying the particular device. Truncation of 24
bits of an EAP-48 or 40 bits of an EAP-64 address zeroes out the bits of an EAP-48 or 40 bits of an EAP-64 address zeroes out the
device identifier while retaining the OUI. device identifier while retaining the OUI.
Truncation is effective for making device manufacturers partially or Truncation is effective for making device manufacturers partially or
completely identifiable within a dataset while deleting unique host completely identifiable within a dataset while deleting unique host
identifiers; this can be used to retain and aggregate MAC layer identifiers; this can be used to retain and aggregate MAC layer
behavior by vendor. behavior by vendor.
Truncation to an address length of 0 is equivalent to black-marker Truncation to an address length of 0 is equivalent to black-marker
anonymisation. anonymization.
4.2.2. Reverse Truncation 4.2.2. Reverse Truncation
Reverse truncation removes "n" of the most significant bits from a Reverse truncation removes "n" of the most significant bits from a
MAC address, replacing them with zeroes. Reverse truncation is a MAC address, replacing them with zeroes. Reverse truncation is a
non-reversible generalisation scheme. This has the effect of non-reversible generalization scheme. This has the effect of
removing bits of the OUI, which identify manufacturers, before removing bits of the OUI, which identify manufacturers, before
removing the least significant bits. Reverse truncation of 24 bits removing the least significant bits. Reverse truncation of 24 bits
zeroes out the OUI. zeroes out the OUI.
Reverse truncation is effective for making device manufacturers Reverse truncation is effective for making device manufacturers
partially or completely unidentifiable within a dataset. However, it partially or completely unidentifiable within a dataset. However, it
may cause ambiguity by introducing the possibility of truncated MAC may cause ambiguity by introducing the possibility of truncated MAC
address collision. Also note that the utility of removing address collision. Also note that the utility of removing
manufacturer information is not particularly well-covered by the manufacturer information is not particularly well-covered by the
literature. literature.
Reverse truncation to an address length of 0 is equivalent to black- Reverse truncation to an address length of 0 is equivalent to black-
marker anonymisation. marker anonymization.
4.2.3. Permutation 4.2.3. Permutation
Permutation is a direct substitution technique, replacing each MAC Permutation is a direct substitution technique, replacing each MAC
address with an address selected from the set of possible MAC address with an address selected from the set of possible MAC
addresses, such that each anonymised address represents a unique addresses, such that each anonymized address represents a unique
original address. The selection function is often random, though it original address. The selection function is often random, though it
is not necessarily so. Permutation does not preserve any structural is not necessarily so. Permutation does not preserve any structural
information about a network, but it does preserve the unique count of information about a network, but it does preserve the unique count of
devices on the network. Any application that requires more structure devices on the network. Any application that requires more structure
than host-uniqueness will not be able to use permuted MAC addresses. than host-uniqueness will not be able to use permuted MAC addresses.
While permutation ideally guarantees that each anonymised address There are many variations of permutation functions, each of which has
represents a unique original address, such requires significant state tradeoffs in performance, security, and guarantees of non-collision;
in the Intermediate Anonymisation Process. Therefore, permutation evaluating these tradeoffs is implementation independent. However,
may be implemented by hashing for performance reasons, with hash in general permutation functions applied to anonymization SHOULD be
functions that may have relatively small collision probabilities. difficult to reverse without knowing the parameters (e.g., a secret
Such techniques are still essentially direct substitution techniques, key for HMAC). While the EAP-48 space is larger than the IPv4
despite the nonzero error probability. address space, hash functions applied without additional parameters
could be reversed through brute force if the hash function is known,
and SHOULD NOT be used as permutation functions. Permutation
functions may guarantee noncollision (i.e., that each anonymized
address represents a unique original address), but need not; however,
the probability of collision SHOULD be low. We treat even
permutations with low but nonzero collision probability as direct
substitution nevertheless. Beyond these guidelines, recommendations
for specific permutation functions are out of scope for this
document.
4.2.4. Structured Pseudonymisation 4.2.4. Structured Pseudonymization
Structured pseudonymisation for MAC addresses is a direct Structured pseudonymization for MAC addresses is a direct
substitution technique, like permutation, but restricted such that substitution technique, like permutation, but restricted such that
the OUI (the most significant three bytes) is permuted separately the OUI (the most significant three bytes) is permuted separately
from the node identifier, the remainder. This is useful when the from the node identifier, the remainder. This is useful when the
uniqueness of OUIs must be preserved for a given analysis task, but uniqueness of OUIs must be preserved for a given analysis task, but
introduces structure into the anonymised data which can be exploited introduces structure into the anonymized data which can be exploited
in attacks against the anonymisation technique. in attacks against the anonymization technique.
4.3. Timestamp Anonymisation 4.3. Timestamp Anonymization
The particular time at which a flow began or ended is not The particular time at which a flow began or ended is not
particularly identifiable information, but it can be used as part of particularly identifiable information, but it can be used as part of
attacks against other anonymisation techniques or for user profiling. attacks against other anonymization techniques or for user profiling,
Precise timestamps can be used in injected-traffic fingerprinting e.g. as in [Mur07]. Timestamps can be used in traffic injection
attacks, which use known information about a set of traffic generated attacks, which use known information about a set of traffic generated
or otherwise known by an attacker to recover mappings of other or otherwise known by an attacker to recover mappings of other
anonymised fields, as well as to identify certain activity by anonymized fields, as well as to identify certain activity by
response delay and size fingerprinting, which compares response sizes response delay and size fingerprinting, which compares response sizes
and inter-flow times in anonymised data to known values. Therefore, and inter-flow times in anonymized data to known values. Note that
timestamp information may be anonymised in order to ensure the these attacks have been shown to be relatively robust against
protection of the entire data set. timestamp anonymization techniques (see [Bur10]), so the techniques
presented in this section are relatively weak and should be used with
care.
+-----------------------+----------------------------+ +-----------------------+----------------------------+
| Scheme | Action | | Scheme | Action |
+-----------------------+----------------------------+ +-----------------------+----------------------------+
| Precision Degradation | Generalisation | | Precision Degradation | Generalization |
| Enumeration | Direct or Set Substitution | | Enumeration | Direct or Set Substitution |
| Random Shifts | Direct Substitution | | Random Shifts | Direct Substitution |
+-----------------------+----------------------------+ +-----------------------+----------------------------+
4.3.1. Precision Degradation 4.3.1. Precision Degradation
Precision Degradation is a generalisation technique that removes the Precision Degradation is a generalization technique that removes the
most precise components of a timestamp, accounting all events most precise components of a timestamp, accounting all events
occurring in each given interval (e.g. one millisecond for occurring in each given interval (e.g. one millisecond for
millisecond level degradation) as simultaneous. This has the effect millisecond level degradation) as simultaneous. This has the effect
of potentially collapsing many timestamps into one. With this of potentially collapsing many timestamps into one. With this
technique time precision is reduced, and sequencing may be lost, but technique time precision is reduced, and sequencing may be lost, but
the information at which time the event occurred is preserved. The the information at which time the event occurred is preserved. The
anonymised data may not be generally useful for applications which anonymized data may not be generally useful for applications which
require strict sequencing of flows. require strict sequencing of flows.
Note that flow meters with low time precision (e.g. second precision, Note that flow meters with low time precision (e.g. second precision,
or millisecond precision on high-capacity networks) perform the or millisecond precision on high-capacity networks) perform the
equivalent of precision degradation anonymisation by their design. equivalent of precision degradation anonymization by their design.
Note also that degradation to a very low precision (e.g. on the order Note also that degradation to a very low precision (e.g. on the order
of minutes, hours, or days) is commonly used in analyses operating on of minutes, hours, or days) is commonly used in analyses operating on
time-series aggregated data, and may also be described as binning; time-series aggregated data, and may also be described as binning;
though the time scales are longer and applicability more restricted, though the time scales are longer and applicability more restricted,
this is in principle the same operation. this is in principle the same operation.
Precision degradation to infinitely low precision is equivalent to Precision degradation to infinitely low precision is equivalent to
black-marker anonymisation. Removal of timestamp information is only black-marker anonymization. Removal of timestamp information is only
recommended for analysis tasks which have no need to separate flows recommended for analysis tasks which have no need to separate flows
in time, for example for counting total volumes or unique occurrences in time, for example for counting total volumes or unique occurrences
of other flow keys in an entire dataset. of other flow keys in an entire dataset.
4.3.2. Enumeration 4.3.2. Enumeration
Enumeration is a substitution function that retains the chronological Enumeration is a substitution function that retains the chronological
order in which events occurred while eliminating time information. order in which events occurred while eliminating time information.
Timestamps are substituted by equidistant timestamps (or numbers) Timestamps are substituted by equidistant timestamps (or numbers)
starting from a randomly chosen start value. The resulting data is starting from a randomly chosen start value. The resulting data is
useful for applications requiring strict sequencing, but not for useful for applications requiring strict sequencing, but not for
those requiring good timing information (e.g. delay- or jitter- those requiring good timing information (e.g. delay- or jitter-
measurement for quality-of-service (QoS) applications or service- measurement for quality-of-service (QoS) applications or service-
level agreement (SLA) validation). level agreement (SLA) validation).
Note that enumeration is functionally equivalent to precision
degradation in any environment into which traffic can be regularly
injected to serve as a clock at the precision of the frequency of the
injected flows.
4.3.3. Random Shifts 4.3.3. Random Shifts
Random time shifts add a random offset to every timestamp within a Random time shifts add a random offset to every timestamp within a
dataset. This reversible substitution technique therefore retains dataset. This reversible substitution technique therefore retains
duration and inter-event interval information as well as duration and inter-event interval information as well as
chronological order of flows. It is primarily intended to defeat chronological order of flows. Random time shifts are quite weak, and
traffic injection fingerprinting attacks. relatively easy to reverse in the presence of external knowledge
about traffic on the measured network.
4.4. Counter Anonymisation 4.4. Counter Anonymization
Counters (such as packet and octet volumes per flow) are subject to Counters (such as packet and octet volumes per flow) are subject to
fingerprinting and injection attacks against anonymisation, or for fingerprinting and injection attacks against anonymization, or for
user profiling as timestamps are. Counter anonymisation can help user profiling as timestamps are. Data sets with anonymized counters
defeat these attacks, but are only usable for analysis tasks for are useful only for analysis tasks for which relative or imprecise
which relative or imprecise magnitudes of activity are useful. magnitudes of activity are useful. Counter information can also be
Counter information can also be completely removed, but this is only completely removed, but this is only recommended for analysis tasks
recommended for analysis tasks which have no need to evaluate the which have no need to evaluate the removed counter, for example for
removed counter, for example for counting only unique occurrences of counting only unique occurrences of other flow keys.
other flow keys.
+-----------------------+----------------------------+ +-----------------------+----------------------------+
| Scheme | Action | | Scheme | Action |
+-----------------------+----------------------------+ +-----------------------+----------------------------+
| Precision Degradation | Generalisation | | Precision Degradation | Generalization |
| Binning | Generalisation | | Binning | Generalization |
| Random noise addition | Direct or Set Substitution | | Random noise addition | Direct or Set Substitution |
+-----------------------+----------------------------+ +-----------------------+----------------------------+
4.4.1. Precision Degradation 4.4.1. Precision Degradation
As with precision degradation in timestamps, precision degradation of As with precision degradation in timestamps, precision degradation of
counters removes lower-order bits of the counters, treating all the counters removes lower-order bits of the counters, treating all the
counters in a given range as having the same value. Depending on the counters in a given range as having the same value. Depending on the
precision reduction, this loses information about the relationships precision reduction, this loses information about the relationships
between sizes of similarly-sized flows, but keeps relative magnitude between sizes of similarly-sized flows, but keeps relative magnitude
information. Precision degradation to an infinitely low precision is information. Precision degradation to an infinitely low precision is
equivalent to black-marker anonymisation. equivalent to black-marker anonymization.
4.4.2. Binning 4.4.2. Binning
Binning can be seen as a special case of precision degradation; the Binning can be seen as a special case of precision degradation; the
operation is identical, except for in precision degradation the operation is identical, except for in precision degradation the
counter ranges are uniform, and in binning they need not be. For counter ranges are uniform, and in binning they need not be. For
example, a common counter binning scheme for packet counters could be example, consider separating unopened TCP connections from
to bin values 1-2 together, and 3-infinity together, thereby potentially opened TCP connections. Here, packet counters per flow
separating potentially completely-opened TCP connections from would be binned into two bins, one for 1-2 packet flows, and one for
unopened ones. Binning schemes are generally chosen to keep flows with 3 or more packets. Binning schemes are generally chosen
precisely the amount of information required in a counter for a given to keep precisely the amount of information required in a counter for
analysis task. Note that, also unlike precision degradation, the bin a given analysis task. Note that, also unlike precision degradation,
label need not be within the bin's range. Binning counters to a the bin label need not be within the bin's range. Binning counters
single bin is equivalent to black-marker anonymisation. to a single bin is equivalent to black-marker anonymization.
4.4.3. Random Noise Addition 4.4.3. Random Noise Addition
Random noise addition adds a random amount to a counter in each flow; Random noise addition adds a random amount to a counter in each flow;
this is used to keep relative magnitude information and minimize the this is used to keep relative magnitude information and minimize the
disruption to size relationship information while avoiding disruption to size relationship information while avoiding
fingerprinting attacks against anonymisation. Note that there is no fingerprinting attacks against anonymization. Note that there is no
guarantee that random noise addition will maintain ranking order by a guarantee that random noise addition will maintain ranking order by a
counter among members of a set. Random noise addition is counter among members of a set. Random noise addition is
particularly useful when the derived analysis data will not be particularly useful when the derived analysis data will not be
presented in such a way as to require the lower-order bits of the presented in such a way as to require the lower-order bits of the
counters. counters.
4.5. Anonymisation of Other Flow Fields 4.5. Anonymization of Other Flow Fields
Other fields, particularly port numbers and protocol numbers, can be Other fields, particularly port numbers and protocol numbers, can be
used to partially identify the applications that generated the used to partially identify the applications that generated the
traffic in a a given flow trace. This information can be used in traffic in a a given flow trace. This information can be used in
fingerprinting attacks, and may be of interest on its own (e.g., to fingerprinting attacks, and may be of interest on its own (e.g., to
reveal that a certain application with suspected vulnerabilities is reveal that a certain application with suspected vulnerabilities is
running on a given network). These fields are generally anonymised running on a given network). These fields are generally anonymized
using one of two techniques. using one of two techniques.
+-------------+---------------------+ +-------------+---------------------+
| Scheme | Action | | Scheme | Action |
+-------------+---------------------+ +-------------+---------------------+
| Binning | Generalisation | | Binning | Generalization |
| Permutation | Direct Substitution | | Permutation | Direct Substitution |
+-------------+---------------------+ +-------------+---------------------+
4.5.1. Binning 4.5.1. Binning
Binning is a generalisation technique mapping a set of potentially Binning is a generalization technique mapping a set of potentially
non-uniform ranges into a set of arbitrarily labeled bins. Common non-uniform ranges into a set of arbitrarily labeled bins. Common
bin arrangements depend on the field type and the analysis bin arrangements depend on the field type and the analysis
application. For example, an IP protocol bin arrangement may application. For example, an IP protocol bin arrangement may
preserve 1, 6, and 17 for ICMP, UDP, and TCP traffic, and bin all preserve 1, 6, and 17 for ICMP, UDP, and TCP traffic, and bin all
other protocols into a single bin, to mitigate the use of uncommon other protocols into a single bin, to mitigate the use of uncommon
protocols in fingerprinting attacks. Another example arrangement may protocols in fingerprinting attacks. Another example arrangement may
bin source and destination ports into low (0-1023) and high (1024- bin source and destination ports into low (0-1023) and high (1024-
65535) bins in order to tell service from ephemeral ports without 65535) bins in order to tell service from ephemeral ports without
identifying individual applications. identifying individual applications.
Binning other flow key fields to a single bin is equivalent to black- Binning other flow key fields to a single bin is equivalent to black-
marker anonymisation. Removal of other flow key information is only marker anonymization. Removal of other flow key information is only
recommended for analysis tasks which have no need to differentiate recommended for analysis tasks which have no need to differentiate
flows on the removed keys, for example for total traffic counts or flows on the removed keys, for example for total traffic counts or
unique counts of other flow keys. unique counts of other flow keys.
4.5.2. Permutation 4.5.2. Permutation
Permutation is a direct substitution technique, replacing each value Permutation is a direct substitution technique, replacing each value
with an value selected from the set of possible range, such that each with an value selected from the set of possible range, such that each
anonymised value represents a unique original value. This is used to anonymized value represents a unique original value. This is used to
preserve the count of unique values without preserving information preserve the count of unique values without preserving information
about, or the ordering of, the values themselves. about, or the ordering of, the values themselves.
While permutation ideally guarantees that each anonymised value While permutation ideally guarantees that each anonymized value
represents a unique original value, such may require significant represents a unique original value, such may require significant
state in the Intermediate Anonymisation Process. Therefore, state in the Intermediate Anonymization Process. Therefore,
permutation may be implemented by hashing for performance reasons, permutation may be implemented by hashing for performance reasons,
with hash functions that may have relatively small collision with hash functions that may have relatively small collision
probabilities. Such techniques are still essentially direct probabilities. Such techniques are still essentially direct
substitution techniques, despite the nonzero error probability. substitution techniques, despite the nonzero error probability.
5. Parameters for the Description of Anonymisation Techniques 5. Parameters for the Description of Anonymization Techniques
This section details the abstract parameters used to describe the This section details the abstract parameters used to describe the
anonymisation techniques examined in the previous section, on a per- anonymization techniques examined in the previous section, on a per-
parameter basis. These parameters and their export safety inform the parameter basis. These parameters and their export safety inform the
design of the IPFIX anonymisation metadata export specified in the design of the IPFIX anonymization metadata export specified in the
following section. following section.
5.1. Stability 5.1. Stability
A stable anonymisation will always map a given value in the real A stable anonymization will always map a given value in the real
space to a given value in the anonymised space, while an unstable space to a given value in the anonymized space, while an unstable
anonymisation will change this mapping over time; a completely anonymization will change this mapping over time; a completely
unstable anonymisation is essentially indistinguishable from black- unstable anonymization is essentially indistinguishable from black-
marker anonymisation. Any given anonymisation technique may be marker anonymization. Any given anonymization technique may be
applied with a varying range of stability. Stability is important applied with a varying range of stability. Stability is important
for assessing the comparability of anonymised information in for assessing the comparability of anonymized information in
different data sets, or in the same data set over different time different data sets, or in the same data set over different time
periods. In practice, an anonymisation may also be stable for every periods. In practice, an anonymization may also be stable for every
data set published by an a particular producer to a particular data set published by an a particular producer to a particular
consumer, stable for a stated time period within a dataset or across consumer, stable for a stated time period within a dataset or across
datasets, or stable only for a single data set. datasets, or stable only for a single data set.
If no information about stability is available, users of anonymised If no information about stability is available, users of anonymized
data MAY assume that the techniques used are stable across the entire data MAY assume that the techniques used are stable across the entire
dataset, but unstable across datasets. Note that stability presents dataset, but unstable across datasets. Note that stability presents
a risk-utility tradeoff, as completely stable anonymisation can be a risk-utility tradeoff, as completely stable anonymization can be
used for longer-term trend analysis tasks but also presents more risk used for longer-term trend analysis tasks but also presents more risk
of attack given the stable mapping. Information about the stability of attack given the stable mapping. Information about the stability
of a mapping SHOULD be exported along with the anonymised data. of a mapping SHOULD be exported along with the anonymized data.
5.2. Truncation Length 5.2. Truncation Length
Truncation and precision degradation are described by the truncation Truncation and precision degradation are described by the truncation
length, or the amount of data still remaining in the anonymised field length, or the amount of data still remaining in the anonymized field
after anonymisation. after anonymization.
Truncation length can generally be inferred from a given data set, Truncation length can generally be inferred from a given data set,
and need not be specially exported or protected. For bit-level and need not be specially exported or protected. For bit-level
truncation, the truncated bits are generally inferable by the least truncation, the truncated bits are generally inferable by the least
significant bit set for an instance of an Information Element significant bit set for an instance of an Information Element
described by a given Template (or the most significant bit set, in described by a given Template (or the most significant bit set, in
the case of reverse truncation). For precision degradation, the the case of reverse truncation). For precision degradation, the
truncation is inferable from the maximum precision given. Note that truncation is inferable from the maximum precision given. Note that
while this inference method is generally applicable, it is data- while this inference method is generally applicable, it is data-
dependent: there is no guarantee that it will recover the exact dependent: there is no guarantee that it will recover the exact
skipping to change at page 18, line 37 skipping to change at page 20, line 13
length alongside the address. length alongside the address.
5.3. Bin Map 5.3. Bin Map
Binning is described by the specification of a bin mapping function. Binning is described by the specification of a bin mapping function.
This function can be generally expressed in terms of an associative This function can be generally expressed in terms of an associative
array that maps each point in the original space to a bin, although array that maps each point in the original space to a bin, although
from an implementation standpoint most bin functions are much simpler from an implementation standpoint most bin functions are much simpler
and more efficient. and more efficient.
Since knowledge of the bin mapping function can be used to partially Since the bin map for a bin mapping function is in essence the bin
deanonymise binned data, depending on the degree of generalisation, mapping key, and can be used to partially deanonymize binned data,
information about the bin mapping function SHOULD NOT be exported. depending on the degree of generalization, information about the bin
mapping function SHOULD NOT be exported.
5.4. Permutation 5.4. Permutation
Like binning, permutation is described by the specification of a Like binning, permutation is described by the specification of a
permutation function. In the general case, this can be expressed in permutation function. In the general case, this can be expressed in
terms of an associative array that maps each point in the original terms of an associative array that maps each point in the original
space to a point in the anonymised space. Unlike binning, each point space to a point in the anonymized space. Unlike binning, each point
in the anonymised space corresponds to a single, unique point in the in the anonymized space corresponds to a single, unique point in the
original space. original space.
Since knowledge of the permutation function may, depending on the Since the parameters of the permutation function are in essence key-
function, be used to completely deanonymise permuted data, like (indeed, for cryptographic permutation functions, they are the
information about the permutation function or its parameters SHOULD keys themselves), information about the permutation function or its
NOT be exported. parameters SHOULD NOT be exported.
5.5. Shift Amount 5.5. Shift Amount
Shifting requires an amount to shift each value by. Since the shift Shifting requires an amount to shift each value by. Since the shift
amount can be used to deanonymise data protected by shifting, amount is the only key to a shift function, and can be used to
information about the shift amount SHOULD NOT be exported. trivially deanonymize data protected by shifting, information about
the shift amount SHOULD NOT be exported.
6. Anonymisation Export Support in IPFIX 6. Anonymization Export Support in IPFIX
Anonymised data exported via IPFIX SHOULD be annotated with Anonymized data exported via IPFIX SHOULD be annotated with
anonymisation metadata, which details which fields described by which anonymization metadata, which details which fields described by which
Templates are anonymised, and provides appropriate information on the Templates are anonymized, and provides appropriate information on the
anonymisation techniques used. This metadata SHOULD be exported in anonymization techniques used. This metadata SHOULD be exported in
Data Records described by the recommended Options Templates described Data Records described by the recommended Options Templates described
in this section; these Options Templates use the additional in this section; these Options Templates use the additional
Information Elements described in the following subsection. Information Elements described in the following subsection.
Note that fields anonymised using the black-marker (removal) Note that fields anonymized using the black-marker (removal)
technique do not require any special metadata support: black-marker technique do not require any special metadata support: black-marker
anonymised fields SHOULD NOT be exported at all, by omitting the anonymized fields SHOULD NOT be exported at all, by omitting the
corresponding Information Elements from Template describing the Data corresponding Information Elements from Template describing the Data
Set. In the case where application requirements dictate that a black- Set. In the case where application requirements dictate that a black-
marker anonymised field must remain in a Template, then an Exporting marker anonymized field must remain in a Template, then an Exporting
Process MAY export black-marker anonymised fields with their native Process MAY export black-marker anonymized fields with their native
length as all-zeros, but only in cases where enough contextual length as all-zeros, but only in cases where enough contextual
information exists within the record to differentiate a black-marker information exists within the record to differentiate a black-marker
anonymised field exported in this way from a real zero value. anonymized field exported in this way from a real zero value.
6.1. Anonymisation Records and the Anonymisation Options Template 6.1. Anonymization Records and the Anonymization Options Template
The Anonymisation Options Template describes Anonymisation Records, The Anonymization Options Template describes Anonymization Records,
which allow anonymisation metadata to be exported inline over IPFIX which allow anonymization metadata to be exported inline over IPFIX
or stored in an IPFIX File, by binding information about or stored in an IPFIX File, by binding information about
anonymisation techniques to Information Elements within defined anonymization techniques to Information Elements within defined
Templates or Options Templates. IPFIX Exporting Processes SHOULD Templates or Options Templates. IPFIX Exporting Processes SHOULD
export anonymisation records for any Template describing exported export anonymization records for any Template describing exported
anonymised Data Records; IPFIX Collecting Processes and processes anonymized Data Records; IPFIX Collecting Processes and processes
downstream from them MAY use anonymisation records to treat downstream from them MAY use anonymization records to treat
anonymised data differently depending on the applied technique. anonymized data differently depending on the applied technique.
Anonymisation Records contain ancillary information bound to a Anonymization Records contain ancillary information bound to a
Template, so many of the considerations for Templates apply to Template, so many of the considerations for Templates apply to
Anonymisation Records as well. First, reliability is important: an Anonymization Records as well. First, reliability is important: an
Exporting Process SHOULD export Anonymisation Records after the Exporting Process SHOULD export Anonymization Records after the
Templates they describe have been exported, and SHOULD export Templates they describe have been exported, and SHOULD export
anonymisation records reliably. anonymization records reliably if supported by the underlying
transport (i.e., without partial reliability when using SCTP)
Anonymisation Records MUST be handled by Collecting Processes as Anonymization Records MUST be handled by Collecting Processes as
scoped to the Template to which they apply within the Transport scoped to the Template to which they apply within the Transport
Session in which they are sent. When a Template is withdrawn via a Session in which they are sent. When a Template is withdrawn via a
Template Withdrawal Message or expires during a UDP transport Template Withdrawal Message or expires during a UDP transport
session, the accompanying Anonymisation Records are withdrawn or session, the accompanying Anonymization Records are withdrawn or
expire as well, and do not apply to subsequent Templates with the expire as well, and do not apply to subsequent Templates with the
same Template ID within the Session unless re-exported. same Template ID within the Session unless re-exported.
The Stability Class within the anonymisationFlags IE can be used to The Stability Class within the anonymizationFlags IE can be used to
declare that a given anonymisation technique's mapping will remain declare that a given anonymization technique's mapping will remain
stable across multiple sessions, but this does not mean that stable across multiple sessions, but this does not mean that
anonymisation technique information given in the Anonymisation anonymization technique information given in the Anonymization
Records themselves persist across Sessions. Each new Transport Records themselves persist across Sessions. Each new Transport
Session MUST contain new Anonymisation Records for each Template Session MUST contain new Anonymization Records for each Template
describing anonymised Data Sets. describing anonymized Data Sets.
SCTP per-stream export [I-D.ietf-ipfix-export-per-sctp-stream] may be SCTP per-stream export [I-D.ietf-ipfix-export-per-sctp-stream] may be
used to ease management of Anonymisation Records if appropriate for used to ease management of Anonymization Records if appropriate for
the application. the application.
The fields of the Anonymisation Options template are as follows: The fields of the Anonymization Options template are as follows:
+-------------------------+-----------------------------------------+ +-------------------------+-----------------------------------------+
| IE | Description | | IE | Description |
+-------------------------+-----------------------------------------+ +-------------------------+-----------------------------------------+
| templateId [scope] | The Template ID of the Template or | | templateId [scope] | The Template ID of the Template or |
| | Options Template containing the | | | Options Template containing the |
| | Information Element described by this | | | Information Element described by this |
| | anonymisation record. This Information | | | anonymization record. This Information |
| | Element MUST be defined as a Scope | | | Element MUST be defined as a Scope |
| | Field. | | | Field. |
| informationElementId | The Information Element identifier of | | informationElementId | The Information Element identifier of |
| [scope] | the Information Element described by | | [scope] | the Information Element described by |
| | this anonymisation record. This | | | this anonymization record. This |
| | Information Element MUST be defined as | | | Information Element MUST be defined as |
| | a Scope Field. Exporting Processes | | | a Scope Field. Exporting Processes |
| | MUST clear then Enterprise bit of the | | | MUST clear then Enterprise bit of the |
| | informationElementId and Collecting | | | informationElementId and Collecting |
| | Processes SHOULD ignore it; information | | | Processes SHOULD ignore it; information |
| | about enterprise-specific Information | | | about enterprise-specific Information |
| | Elements is exported via the | | | Elements is exported via the |
| | privateEnterpriseNumber Information | | | privateEnterpriseNumber Information |
| | Element. | | | Element. |
| privateEnterpriseNumber | The Private Enterprise Number of the | | privateEnterpriseNumber | The Private Enterprise Number of the |
| [scope] [optional] | enterprise-specific Information Element | | [scope] [optional] | enterprise-specific Information Element |
| | described by this anonymisation record. | | | described by this anonymization record. |
| | This Information Element MUST be | | | This Information Element MUST be |
| | defined as a Scope Field if present. A | | | defined as a Scope Field if present. A |
| | privateEnterpriseNumber of 0 signifies | | | privateEnterpriseNumber of 0 signifies |
| | that the Information Element is | | | that the Information Element is |
| | IANA-registered. | | | IANA-registered. |
| informationElementIndex | The Information Element index of the | | informationElementIndex | The Information Element index of the |
| [scope] [optional] | instance of the Information Element | | [scope] [optional] | instance of the Information Element |
| | described by this anonymisation record | | | described by this anonymization record |
| | identified by the informationElementId | | | identified by the informationElementId |
| | within the Template. Optional; need | | | within the Template. Optional; need |
| | only be present when describing | | | only be present when describing |
| | Templates that have multiple instances | | | Templates that have multiple instances |
| | of the same Information Element. This | | | of the same Information Element. This |
| | Information Element MUST be defined as | | | Information Element MUST be defined as |
| | a Scope Field if present. This | | | a Scope Field if present. This |
| | Information Element is defined in | | | Information Element is defined in |
| | Section 6.2, below. | | | Section 6.2, below. |
| anonymisationFlags | Flags describing the mapping stability | | anonymizationFlags | Flags describing the mapping stability |
| | and specialized modifications to the | | | and specialized modifications to the |
| | Anonymisation Technique in use. SHOULD | | | Anonymization Technique in use. SHOULD |
| | be present. This Information Element | | | be present. This Information Element |
| | is defined in Section 6.2.3, below. | | | is defined in Section 6.2.3, below. |
| anonymisationTechnique | The technique used to anonymise the | | anonymizationTechnique | The technique used to anonymize the |
| | data. MUST be present. This | | | data. MUST be present. This |
| | Information Element is defined in | | | Information Element is defined in |
| | Section 6.2.2, below. | | | Section 6.2.2, below. |
+-------------------------+-----------------------------------------+ +-------------------------+-----------------------------------------+
6.2. Recommended Information Elements for Anonymisation Metadata 6.2. Recommended Information Elements for Anonymization Metadata
6.2.1. informationElementIndex 6.2.1. informationElementIndex
Description: A zero-based index of an Information Element Description: A zero-based index of an Information Element
referenced by informationElementId within a Template referenced by referenced by informationElementId within a Template referenced by
templateId; used to disambiguate scope for templates containing templateId; used to disambiguate scope for templates containing
multiple identical Information Elements. multiple identical Information Elements.
Abstract Data Type: unsigned16 Abstract Data Type: unsigned16
Data Type Semantics: identifier
ElementId: TBD3 ElementId: TBD3
Status: Proposed Status: Current
6.2.2. anonymisationTechnique 6.2.2. anonymizationTechnique
Description: A description of the anonymisation technique applied Description: A description of the anonymization technique applied
to a referenced Information Element within a referenced Template. to a referenced Information Element within a referenced Template.
Each technique may be applicable only to certain Information Each technique may be applicable only to certain Information
Elements and recommended only for certain Infomation Elements; Elements and recommended only for certain Infomation Elements;
these restrictions are noted in the table below. these restrictions are noted in the table below.
+-------+---------------------------+-----------------+-------------+ +-------+---------------------------+-----------------+-------------+
| Value | Description | Applicable to | Recommended | | Value | Description | Applicable to | Recommended |
| | | | for | | | | | for |
+-------+---------------------------+-----------------+-------------+ +-------+---------------------------+-----------------+-------------+
| 0 | Undefined: the Exporting | all | all | | 0 | Undefined: the Exporting | all | all |
| | Process makes no | | | | | Process makes no | | |
| | representation as to | | | | | representation as to | | |
| | whether the defined field | | | | | whether the defined field | | |
| | is anonymised or not. | | | | | is anonymized or not. | | |
| | While the Collecting | | | | | While the Collecting | | |
| | Process MAY assume that | | | | | Process MAY assume that | | |
| | the field is not | | | | | the field is not | | |
| | anonymised, it is not | | | | | anonymized, it is not | | |
| | guaranteed not to be. | | | | | guaranteed not to be. | | |
| | This is the default | | | | | This is the default | | |
| | anonymisation technique. | | | | | anonymization technique. | | |
| 1 | None: the values exported | all | all | | 1 | None: the values exported | all | all |
| | are real. | | | | | are real. | | |
| 2 | Precision | all | all | | 2 | Precision | all | all |
| | Degradation/Truncation: | | | | | Degradation/Truncation: | | |
| | the values exported are | | | | | the values exported are | | |
| | anonymised using simple | | | | | anonymized using simple | | |
| | precision degradation or | | | | | precision degradation or | | |
| | truncation. The new | | | | | truncation. The new | | |
| | precision or number of | | | | | precision or number of | | |
| | truncated bits is | | | | | truncated bits is | | |
| | implicit in the exported | | | | | implicit in the exported | | |
| | data, and can be deduced | | | | | data, and can be deduced | | |
| | by the Collecting | | | | | by the Collecting | | |
| | Process. | | | | | Process. | | |
| 3 | Binning: the values | all | all | | 3 | Binning: the values | all | all |
| | exported are anonymised | | | | | exported are anonymized | | |
| | into bins. | | | | | into bins. | | |
| 4 | Enumeration: the values | all | timestamps | | 4 | Enumeration: the values | all | timestamps |
| | exported are anonymised | | | | | exported are anonymized | | |
| | by enumeration. | | | | | by enumeration. | | |
| 5 | Permutation: the values | all | identifiers | | 5 | Permutation: the values | all | identifiers |
| | exported are anonymised | | | | | exported are anonymized | | |
| | by permutation. | | | | | by permutation. | | |
| 6 | Structured Permutation: | addresses | | | 6 | Structured Permutation: | addresses | |
| | the values exported are | | | | | the values exported are | | |
| | anonymised by | | | | | anonymized by | | |
| | permutation, preserving | | | | | permutation, preserving | | |
| | bit-level structure as | | | | | bit-level structure as | | |
| | appropriate; this | | | | | appropriate; this | | |
| | represents | | | | | represents | | |
| | prefix-preserving IP | | | | | prefix-preserving IP | | |
| | address anonymisation or | | | | | address anonymization or | | |
| | structured MAC address | | | | | structured MAC address | | |
| | anonymisation. | | | | | anonymization. | | |
| 7 | Reverse Truncation: the | addresses | | | 7 | Reverse Truncation: the | addresses | |
| | values exported are | | | | | values exported are | | |
| | anonymised using reverse | | | | | anonymized using reverse | | |
| | truncation. The number | | | | | truncation. The number | | |
| | of truncated bits is | | | | | of truncated bits is | | |
| | implicit in the exported | | | | | implicit in the exported | | |
| | data, and can be deduced | | | | | data, and can be deduced | | |
| | by the Collecting | | | | | by the Collecting | | |
| | Process. | | | | | Process. | | |
| 8 | Noise: the values | non-identifiers | counters | | 8 | Noise: the values | non-identifiers | counters |
| | exported are anonymised | | | | | exported are anonymized | | |
| | by adding random noise to | | | | | by adding random noise to | | |
| | each value. | | | | | each value. | | |
| 9 | Offset: the values | all | timestamps | | 9 | Offset: the values | all | timestamps |
| | exported are anonymised | | | | | exported are anonymized | | |
| | by adding a single offset | | | | | by adding a single offset | | |
| | to all values. | | | | | to all values. | | |
+-------+---------------------------+-----------------+-------------+ +-------+---------------------------+-----------------+-------------+
Abstract Data Type: unsigned16 Abstract Data Type: unsigned16
Data Type Semantics: identifier
ElementId: TBD2 ElementId: TBD2
Status: Proposed Status: Current
6.2.3. anonymisationFlags 6.2.3. anonymizationFlags
Description: A flag word describing specialized modifications to Description: A flag word describing specialized modifications to
the anonymisation policy in effect for the anonymisation technique the anonymization policy in effect for the anonymization technique
applied to a referenced Information Element within a referenced applied to a referenced Information Element within a referenced
Template. When flags are clear (0), the normal policy (as Template. When flags are clear (0), the normal policy (as
described by anonymisationTechnique) applies without modification. described by anonymizationTechnique) applies without modification.
MSB 14 13 12 11 10 9 8 7 6 5 4 3 2 1 LSB MSB 14 13 12 11 10 9 8 7 6 5 4 3 2 1 LSB
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| Reserved |LOR|PmA| SC | | Reserved |LOR|PmA| SC |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
anonymisationFlags IE
anonymizationFlags IE
+--------+----------+-----------------------------------------------+ +--------+----------+-----------------------------------------------+
| bit(s) | name | description | | bit(s) | name | description |
| (LSB = | | | | (LSB = | | |
| 0) | | | | 0) | | |
+--------+----------+-----------------------------------------------+ +--------+----------+-----------------------------------------------+
| 0-1 | SC | Stability Class: see the Stability Class | | 0-1 | SC | Stability Class: see the Stability Class |
| | | table below, and section Section 5.1. | | | | table below, and section Section 5.1. |
| 2 | PmA | Perimeter Anonymisation: when set (1), | | 2 | PmA | Perimeter Anonymization: when set (1), |
| | | source- Information Elements as described in | | | | source- Information Elements as described in |
| | | [RFC5103] are interpreted as external | | | | [RFC5103] are interpreted as external |
| | | addresses, and destination- Information | | | | addresses, and destination- Information |
| | | Elements as described in [RFC5103] are | | | | Elements as described in [RFC5103] are |
| | | interpreted as internal addresses, for the | | | | interpreted as internal addresses, for the |
| | | purposes of associating | | | | purposes of associating |
| | | anonymisationTechnique to Information | | | | anonymizationTechnique to Information |
| | | Elements only; see Section 7.2.2 for details. | | | | Elements only; see Section 7.2.2 for details. |
| | | This bit MUST NOT be set when associated with | | | | This bit MUST NOT be set when associated with |
| | | a non-endpoint (i.e., source- or | | | | a non-endpoint (i.e., source- or |
| | | destination-) Information Element. SHOULD be | | | | destination-) Information Element. SHOULD be |
| | | consistent within a record (i.e., if a | | | | consistent within a record (i.e., if a |
| | | source- Information Element has this flag | | | | source- Information Element has this flag |
| | | set, the corresponding destination- element | | | | set, the corresponding destination- element |
| | | SHOULD have this flag set, and vice-versa.) | | | | SHOULD have this flag set, and vice-versa.) |
| 3 | LOR | Low-Order Unchanged: when set (1), the | | 3 | LOR | Low-Order Unchanged: when set (1), the |
| | | low-order bits of the anonymised Information | | | | low-order bits of the anonymized Information |
| | | Element contain real data. This modification | | | | Element contain real data. This modification |
| | | is intended for the anonymisation of | | | | is intended for the anonymization of |
| | | network-level addresses while leaving | | | | network-level addresses while leaving |
| | | host-level addresses intact in order to | | | | host-level addresses intact in order to |
| | | preserve host level-structure, which could | | | | preserve host level-structure, which could |
| | | otherwise be used to reverse anonymisation. | | | | otherwise be used to reverse anonymization. |
| | | MUST NOT be set when associated with a | | | | MUST NOT be set when associated with a |
| | | truncation-based anonymisationTechnique. | | | | truncation-based anonymizationTechnique. |
| 4-15 | Reserved | Reserved for future use: SHOULD be cleared | | 4-15 | Reserved | Reserved for future use: SHOULD be cleared |
| | | (0) by the Exporting Process and MUST be | | | | (0) by the Exporting Process and MUST be |
| | | ignored by the Collecting Process. | | | | ignored by the Collecting Process. |
+--------+----------+-----------------------------------------------+ +--------+----------+-----------------------------------------------+
The Stability Class portion of this flags word describes the The Stability Class portion of this flags word describes the
stability class of the anonymisation technique applied to a stability class of the anonymization technique applied to a
referenced Information Element within a referenced Template. referenced Information Element within a referenced Template.
Stability classes refer to the stability of the parameters of the Stability classes refer to the stability of the parameters of the
anonymisation technique, and therefore the comparability of the anonymization technique, and therefore the comparability of the
mapping between the real and anonymised values over time. This mapping between the real and anonymized values over time. This
determines which anonymised datasets may be compared with each determines which anonymized datasets may be compared with each
other. Values are as follows: other. Values are as follows:
+-----+-----+-------------------------------------------------------+ +-----+-----+-------------------------------------------------------+
| Bit | Bit | Description | | Bit | Bit | Description |
| 1 | 0 | | | 1 | 0 | |
+-----+-----+-------------------------------------------------------+ +-----+-----+-------------------------------------------------------+
| 0 | 0 | Undefined: the Exporting Process makes no | | 0 | 0 | Undefined: the Exporting Process makes no |
| | | representation as to how stable the mapping is, or | | | | representation as to how stable the mapping is, or |
| | | over what time period values of this field will | | | | over what time period values of this field will |
| | | remain comparable; while the Collecting Process MAY | | | | remain comparable; while the Collecting Process MAY |
| | | assume Session level stability, Session level | | | | assume Session level stability, Session level |
| | | stability is not guaranteed. Processes SHOULD assume | | | | stability is not guaranteed. Processes SHOULD assume |
| | | this is the case in the absence of stability class | | | | this is the case in the absence of stability class |
| | | information; this is the default stability class. | | | | information; this is the default stability class. |
| 0 | 1 | Session: the Exporting Process will ensure that the | | 0 | 1 | Session: the Exporting Process will ensure that the |
| | | parameters of the anonymisation technique are stable | | | | parameters of the anonymization technique are stable |
| | | during the Transport Session. All the values of the | | | | during the Transport Session. All the values of the |
| | | described Information Element for each Record | | | | described Information Element for each Record |
| | | described by the referenced Template within the | | | | described by the referenced Template within the |
| | | Transport Session are comparable. The Exporting | | | | Transport Session are comparable. The Exporting |
| | | Process SHOULD endeavour to ensure at least this | | | | Process SHOULD endeavour to ensure at least this |
| | | stability class. | | | | stability class. |
| 1 | 0 | Exporter-Collector Pair: the Exporting Process will | | 1 | 0 | Exporter-Collector Pair: the Exporting Process will |
| | | ensure that the parameters of the anonymisation | | | | ensure that the parameters of the anonymization |
| | | technique are stable across Transport Sessions over | | | | technique are stable across Transport Sessions over |
| | | time with the given Collecting Process, but may use | | | | time with the given Collecting Process, but may use |
| | | different parameters for different Collecting | | | | different parameters for different Collecting |
| | | Processes. Data exported to different Collecting | | | | Processes. Data exported to different Collecting |
| | | Processes are not comparable. | | | | Processes are not comparable. |
| 1 | 1 | Stable: the Exporting Process will ensure that the | | 1 | 1 | Stable: the Exporting Process will ensure that the |
| | | parameters of the anonymisation technique are stable | | | | parameters of the anonymization technique are stable |
| | | across Transport Sessions over time, regardless of | | | | across Transport Sessions over time, regardless of |
| | | the Collecting Process to which it is sent. | | | | the Collecting Process to which it is sent. |
+-----+-----+-------------------------------------------------------+ +-----+-----+-------------------------------------------------------+
Abstract Data Type: unsigned16 Abstract Data Type: unsigned16
Data Type Semantics: flags
ElementId: TBD1 ElementId: TBD1
Status: Proposed Status: Current
7. Applying Anonymisation Techniques to IPFIX Export and Storage 7. Applying Anonymization Techniques to IPFIX Export and Storage
When exporting or storing anonymised flow data using IPFIX, certain When exporting or storing anonymized flow data using IPFIX, certain
interactions between the IPFIX Protocol and the anonymisation interactions between the IPFIX Protocol and the anonymization
techniques in use must be considered; these are treated in the techniques in use must be considered; these are treated in the
subsections below. subsections below.
7.1. Arrangement of Processes in IPFIX Anonymisation 7.1. Arrangement of Processes in IPFIX Anonymization
Anonymisation may be applied to IPFIX data at three stages within the Anonymization may be applied to IPFIX data at three stages within the
collection infrastructure: on initial export, at a mediator, or after collection infrastructure: on initial export, at a mediator, or after
collection, as shown in Figure 1. Each of these locations has collection, as shown in Figure 1. Each of these locations has
specific considerations and applicability. specific considerations and applicability.
+==========================================+ +==========================================+
| Exporting Process | | Exporting Process |
+==========================================+ +==========================================+
| | | |
| (Anonymised at Original Exporter) | | (Anonymized at Original Exporter) |
V | V |
+=============================+ | +=============================+ |
| Mediator | | | Mediator | |
+=============================+ | +=============================+ |
| | | |
| (Anonymising Mediator) | | (Anonymising Mediator) |
V V V V
+==========================================+ +==========================================+
| Collecting Process | | Collecting Process |
+==========================================+ +==========================================+
| |
| (Anonymising CP/File Writer) | (Anonymising CP/File Writer)
V V
+--------------------+ +--------------------+
| IPFIX File Storage | | IPFIX File Storage |
+--------------------+ +--------------------+
Figure 1: Potential Anonymisation Locations Figure 1: Potential Anonymization Locations
Anonymisation is generally performed before the wider dissemination Anonymization is generally performed before the wider dissemination
or repurposing of a flow data set, e.g., adapting operational or repurposing of a flow data set, e.g., adapting operational
measurement data for research. Therefore, direct anonymisation of measurement data for research. Therefore, direct anonymization of
flow data on initial export is only applicable in certain restricted flow data on initial export is only applicable in certain restricted
circumstances: when the Exporting Process (EP) is "publishing" data circumstances: when the Exporting Process (EP) is "publishing" data
to a Collecting Process (CP) directly, and the Exporting Process and to a Collecting Process (CP) directly, and the Exporting Process and
Collecting Process are operated by different entities. Note that Collecting Process are operated by different entities. Note that
certain guidelines in Section 7.2.3 with respect to timestamp certain guidelines in Section 7.2.3 with respect to timestamp
anonymisation may not apply in this case, as the Collecting Process anonymization may not apply in this case, as the Collecting Process
may be able to deduce certain timing information from the time at may be able to deduce certain timing information from the time at
which each Message is received. which each Message is received.
A much more flexible arrangement is to anonymise data within a A much more flexible arrangement is to anonymize data within a
Mediator [I-D.ietf-ipfix-mediators-framework]. Here, original data Mediator [I-D.ietf-ipfix-mediators-framework]. Here, original data
is sent to a Mediator, which performs the anonymisation function and is sent to a Mediator, which performs the anonymization function and
re-exports the anonymised data. Such a Mediator could be located at re-exports the anonymized data. Such a Mediator could be located at
the administrative domain boundary of the initial Exporting Process the administrative domain boundary of the initial Exporting Process
operator, exporting anonymised data to other consumers outside the operator, exporting anonymized data to other consumers outside the
organisation. In this case, the original Exporter SHOULD use TLS as organization. In this case, the original Exporter SHOULD use TLS
specified in [RFC5101] to secure the channel to the Mediator, and the [RFC5246] as specified in [RFC5101] to secure the channel to the
Mediator should follow the guidelines in Section 7.2, to mitigate the Mediator, and the Mediator should follow the guidelines in
risk of original data disclosure. Section 7.2, to mitigate the risk of original data disclosure.
When data is to be published as an anonymised data set in an IPFIX When data is to be published as an anonymized data set in an IPFIX
File [RFC5655], the anonymisation may be done at the final Collecting File [RFC5655], the anonymization may be done at the final Collecting
Process before storage and dissemination, as well. In this case, the Process before storage and dissemination, as well. In this case, the
Collector should follow the guidelines in Section 7.2, especially as Collector should follow the guidelines in Section 7.2, especially as
regards File-specific Options in Section 7.2.4 regards File-specific Options in Section 7.2.4
In each of these data flows, the anonymisation of records is In each of these data flows, the anonymization of records is
undertaken by an Intermediate Anonymisation Process (IAP); the data undertaken by an Intermediate Anonymization Process (IAP); the data
flows into and out of this IAP are shown in Figure 2 below. flows into and out of this IAP are shown in Figure 2 below.
packets --+ +- IPFIX Messages -+ packets --+ +- IPFIX Messages -+
| | | | | |
V V V V V V
+==================+ +====================+ +=============+ +==================+ +====================+ +=============+
| Metering Process | | Collecting Process | | File Reader | | Metering Process | | Collecting Process | | File Reader |
+==================+ +====================+ +=============+ +==================+ +====================+ +=============+
| Non-anonymised | Records | | Non-anonymized | Records |
V V V V V V
+=========================================================+ +=========================================================+
| Intermediate Anonymisation Process (IAP) | | Intermediate Anonymization Process (IAP) |
+=========================================================+ +=========================================================+
| Anonymised ^ Anonymised | | Anonymized ^ Anonymized |
| Records | Records | | Records | Records |
V | V V | V
+===================+ Anonymisation +=============+ +===================+ Anonymization +=============+
| Exporting Process |<--- Parameters ------>| File Writer | | Exporting Process |<--- Parameters ------>| File Writer |
+===================+ +=============+ +===================+ +=============+
| | | |
+------------> IPFIX Messages <----------+ +------------> IPFIX Messages <----------+
Figure 2: Data flows through the anonymisation process Figure 2: Data flows through the anonymization process
Anonymisation parameters must also be available to the Exporting Anonymization parameters must also be available to the Exporting
Process and/or File Writer in order to ensure header data is also Process and/or File Writer in order to ensure header data is also
appropriately anonymised as in Section 7.2.3. appropriately anonymized as in Section 7.2.3.
Following each of the data flows through the IAP, we describe five Following each of the data flows through the IAP, we describe five
basic types of anonymisation arrangements within this framework in basic types of anonymization arrangements within this framework in
Figure 3. In addition to the three arrangements described in detail Figure 3. In addition to the three arrangements described in detail
above, anonymisation can also be done at a collocated Metering above, anonymization can also be done at a collocated Metering
Process (MP) and File Writer (FW) (see section 7.3.2 of [RFC5655]), Process (MP) and File Writer (FW) (see section 7.3.2 of [RFC5655]),
or at a file manipulator, which combines a File Writer with a File or at a file manipulator, which combines a File Writer with a File
Reader (FR) (see section 7.3.7 of [RFC5655]). Reader (FR) (see section 7.3.7 of [RFC5655]).
+----+ +-----+ +----+ +----+ +-----+ +----+
pkts -> | MP |->| IAP |->| EP |-> anonymisation on Original Exporter pkts -> | MP |->| IAP |->| EP |-> anonymization on Original Exporter
+----+ +-----+ +----+ +----+ +-----+ +----+
+----+ +-----+ +----+ +----+ +-----+ +----+
pkts -> | MP |->| IAP |->| FW |-> Anonymising collocated MP/File Writer pkts -> | MP |->| IAP |->| FW |-> Anonymising collocated MP/File Writer
+----+ +-----+ +----+ +----+ +-----+ +----+
+----+ +-----+ +----+ +----+ +-----+ +----+
IPFIX -> | CP |->| IAP |->| EP |-> Anonymising Mediator (Masq. Proxy) IPFIX -> | CP |->| IAP |->| EP |-> Anonymising Mediator (Masq. Proxy)
+----+ +-----+ +----+ +----+ +-----+ +----+
+----+ +-----+ +----+ +----+ +-----+ +----+
IPFIX -> | CP |->| IAP |->| FW |-> Anonymising collocated CP/File Writer IPFIX -> | CP |->| IAP |->| FW |-> Anonymising collocated CP/File Writer
+----+ +-----+ +----+ +----+ +-----+ +----+
+----+ +-----+ +----+ +----+ +-----+ +----+
IPFIX -> | FR |->| IAP |->| FW |-> Anonymising file manipulator IPFIX -> | FR |->| IAP |->| FW |-> Anonymising file manipulator
File +----+ +-----+ +----+ File +----+ +-----+ +----+
Figure 3: Possible anonymisation arrangements in the IPFIX Figure 3: Possible anonymization arrangements in the IPFIX
architecture architecture
Note that anonymisation may occur at more than one location within a Note that anonymization may occur at more than one location within a
given collection infrastructure, to provide varying levels of given collection infrastructure, to provide varying levels of
anonymisation, disclosure risk, or data utility for specific anonymization, disclosure risk, or data utility for specific
purposes. purposes.
7.2. IPFIX-Specific Anonymisation Guidelines 7.2. IPFIX-Specific Anonymization Guidelines
In implementing and deploying the anonymisation techniques described In implementing and deploying the anonymization techniques described
in this document, implementors should note that IPFIX already in this document, implementors should note that IPFIX already
provides features that support anonymised data export, and use these provides features that support anonymized data export, and use these
where appropriate. Care must also be taken that data structures where appropriate. Care must also be taken that data structures
supporting the operation of the protocol itself do not leak data that supporting the operation of the protocol itself do not leak data that
could be used to reverse the anonymisation applied to the flow data. could be used to reverse the anonymization applied to the flow data.
Such data structures may appear in the header, or within the data Such data structures may appear in the header, or within the data
stream itself, especially as options data. Each of these and their stream itself, especially as options data. Each of these and their
impact on specific anonymisation techniques is noted in a separate impact on specific anonymization techniques is noted in a separate
subsection below. subsection below.
7.2.1. Appropriate Use of Information Elements for Anonymised Data 7.2.1. Appropriate Use of Information Elements for Anonymized Data
Note, as in Section 6 above, that black-marker anonymised fields Note, as in Section 6 above, that black-marker anonymized fields
SHOULD NOT be exported at all; the absence of the field in a given SHOULD NOT be exported at all; the absence of the field in a given
Data Set is implicitly declared by not including the corresponding Data Set is implicitly declared by not including the corresponding
Information Element in the Template describing that Data Set. Information Element in the Template describing that Data Set.
When using precision degradation of timestamps, Exporting Processes When using precision degradation of timestamps, Exporting Processes
SHOULD export timing information using Information Elements of an SHOULD export timing information using Information Elements of an
appropriate precision, as explained in Section 4.5 of [RFC5153]. For appropriate precision, as explained in Section 4.5 of [RFC5153]. For
example, timestamps measured in millisecond-level precision and example, timestamps measured in millisecond-level precision and
degraded to second-level precision should use flowStartSeconds and degraded to second-level precision should use flowStartSeconds and
flowEndSeconds, not flowStartMilliseconds and flowEndMilliseconds. flowEndSeconds, not flowStartMilliseconds and flowEndMilliseconds.
When exporting anonymised data and anonymisation metadata, Exporting When exporting anonymized data and anonymization metadata, Exporting
Processes SHOULD ensure that the combination of Information Element Processes SHOULD ensure that the combination of Information Element
and declared anonymisation technique are compatible. Specifically, and declared anonymization technique are compatible. Specifically,
the applicable and recommended Information Element types and the applicable and recommended Information Element types and
semantics for each technique are noted in the description of the semantics for each technique are noted in the description of the
anonymisationTechnique Information Element in Section 6.2.2. In this anonymizationTechnique Information Element in Section 6.2.2. In this
description, a timestamp is an Information Element with the data type description, a timestamp is an Information Element with the data type
dateTimeSeconds, dataTimeMilliseconds, dateTimeMicroseconds, or dateTimeSeconds, dataTimeMilliseconds, dateTimeMicroseconds, or
dateTimeNanoseconds; an address is an Information Element with the dateTimeNanoseconds; an address is an Information Element with the
data type ipv4Address, ipv6Address, or macAddress; and an identifier data type ipv4Address, ipv6Address, or macAddress; and an identifier
is an Information Element with identifier data type semantics. is an Information Element with identifier data type semantics.
Exporting Process MUST NOT export Anonymisation Options records Exporting Process MUST NOT export Anonymization Options records
binding techniques to Information Elements to which they are not binding techniques to Information Elements to which they are not
applicable, and SHOULD NOT export Anonymisation Options records applicable, and SHOULD NOT export Anonymization Options records
binding techniques to Information Elements for which they are not binding techniques to Information Elements for which they are not
recommended. recommended.
7.2.2. Export of Perimeter-Based Anonymisation Policies 7.2.2. Export of Perimeter-Based Anonymization Policies
Data collected from a single network may require different Data collected from a single network may require different
anonymisation policies for addresses internal and external to the anonymization policies for addresses internal and external to the
network. For example, internal addresses could be subject to simple network. For example, internal addresses could be subject to simple
permutation, while external addresses could be aggregated into permutation, while external addresses could be aggregated into
networks by truncation. When exporting anonymised perimeter networks by truncation. When exporting anonymized perimeter
bidirectional flow (biflow) data as in section 5.2 of [RFC5103], this bidirectional flow (biflow) data as in section 5.2 of [RFC5103], this
arrangement may be easily represented by specifying one technique for arrangement may be easily represented by specifying one technique for
source endpoint information (which represents the external endpoint source endpoint information (which represents the external endpoint
in a perimeter biflow) and one technique for destination endpoint in a perimeter biflow) and one technique for destination endpoint
information (which represents the internal address in a perimeter information (which represents the internal address in a perimeter
biflow). biflow).
However, it can also be useful to represent perimeter-based However, it can also be useful to represent perimeter-based
anonymisation policies with unidirectional flow (uniflow), or non- anonymization policies with unidirectional flow (uniflow), or non-
perimeter biflow data. In this case, the Perimeter Anonymisation bit perimeter biflow data. In this case, the Perimeter Anonymization bit
(bit 2) in the anonymisationFlags Information Element describing the (bit 2) in the anonymizationFlags Information Element describing the
anonymised address Information Elements can be set to change the anonymized address Information Elements can be set to change the
meaning of "source" and "destination" of Information Elements to mean meaning of "source" and "destination" of Information Elements to mean
"external" and "internal" as with perimeter biflows, but only with "external" and "internal" as with perimeter biflows, but only with
respect to anonymisation policies. respect to anonymization policies.
7.2.3. Anonymisation of Header Data 7.2.3. Anonymization of Header Data
Each IPFIX Message contains a Message Header; within this Message Each IPFIX Message contains a Message Header; within this Message
Header are contained two fields which may be used to break certain Header are contained two fields which may be used to break certain
anonymisation techniques: the Export Time, and the Observation Domain anonymization techniques: the Export Time, and the Observation Domain
ID ID
Export of IPFIX Messages containing anonymised timestamp data where Export of IPFIX Messages containing anonymized timestamp data where
the original Export Time Message header has some relationship to the the original Export Time Message header has some relationship to the
anonymised timestamps SHOULD anonymise the Export Time header field anonymized timestamps SHOULD anonymize the Export Time header field
so that the Export Time is consistent with the anonymised timestamp so that the Export Time is consistent with the anonymized timestamp
data. Otherwise, relationships between export and flow time could be data. Otherwise, relationships between export and flow time could be
used to partially or totally reverse timestamp anonymisation. When used to partially or totally reverse timestamp anonymization. When
anonymising timestamps and the Export Time header field SHOULD avoid anonymising timestamps and the Export Time header field SHOULD avoid
times too far in the past or future; while [RFC5101] does not make times too far in the past or future; while [RFC5101] does not make
any allowance for Export Time error detection, it is sensible that any allowance for Export Time error detection, it is sensible that
Collecting Processes may interpret Messages with seemingly Collecting Processes may interpret Messages with seemingly
nonsensical Export Times as erroneous. Specific limits are nonsensical Export Times as erroneous. Specific limits are
implementation-dependent, but this issue may cause interoperability implementation-dependent, but this issue may cause interoperability
issues when anonymising the Export Time header field. issues when anonymising the Export Time header field.
The similarity in size between an Observation Domain ID and an IPv4 The similarity in size between an Observation Domain ID and an IPv4
address (32 bits) may lead to a temptation to use an IPv4 interface address (32 bits) may lead to a temptation to use an IPv4 interface
address on the Metering or Exporting Process as the Observation address on the Metering or Exporting Process as the Observation
Domain ID. If this address bears some relation to the IP addresses Domain ID. If this address bears some relation to the IP addresses
in the flow data (e.g., shares a network prefix with internal in the flow data (e.g., shares a network prefix with internal
addresses) and the IP addresses in the flow data are anonymised in a addresses) and the IP addresses in the flow data are anonymized in a
structure-preserving way, then the Observation Domain ID may be used structure-preserving way, then the Observation Domain ID may be used
to break the IP address anonymisation. Use of an IPv4 interface to break the IP address anonymization. Use of an IPv4 interface
address on the Metering or Exporting Process as the Observation address on the Metering or Exporting Process as the Observation
Domain ID is NOT RECOMMENDED in this case. Domain ID is NOT RECOMMENDED in this case.
7.2.4. Anonymisation of Options Data 7.2.4. Anonymization of Options Data
IPFIX uses the Options mechanism to export, among other things, IPFIX uses the Options mechanism to export, among other things,
metadata about exported flows and the flow collection infrastructure. metadata about exported flows and the flow collection infrastructure.
As with the IPFIX Message Header, certain Options recommended in As with the IPFIX Message Header, certain Options recommended in
[RFC5101] and [RFC5655] containing flow timestamps and network [RFC5101] and [RFC5655] containing flow timestamps and network
addresses of Exporting and Collecting Processes may be used to break addresses of Exporting and Collecting Processes may be used to break
certain anonymisation techniques. When using these Options along certain anonymization techniques. When using these Options along
anonymised data export and storage, values within the Options which anonymized data export and storage, values within the Options which
could be used to break the anonymisation SHOULD themselves be could be used to break the anonymization SHOULD themselves be
anonymised or omitted. anonymized or omitted.
The Exporting Process Reliability Statistics Options Template, The Exporting Process Reliability Statistics Options Template,
recommended in [RFC5101], contains an Exporting Process ID field, recommended in [RFC5101], contains an Exporting Process ID field,
which may be an exportingProcessIPv4Address Information Element or an which may be an exportingProcessIPv4Address Information Element or an
exportingProcessIPv6Address Information Element. If the Exporting exportingProcessIPv6Address Information Element. If the Exporting
Process address bears some relation to the IP addresses in the flow Process address bears some relation to the IP addresses in the flow
data (e.g., shares a network prefix with internal addresses) and the data (e.g., shares a network prefix with internal addresses) and the
IP addresses in the flow data are anonymised in a structure- IP addresses in the flow data are anonymized in a structure-
preserving way, then the Exporting Process address may be used to preserving way, then the Exporting Process address may be used to
break the IP address anonymisation. Exporting Processes exporting break the IP address anonymization. Exporting Processes exporting
anonymised data in this situation SHOULD mitigate the risk of attack anonymized data in this situation SHOULD mitigate the risk of attack
either by omitting Options described by the Exporting Process either by omitting Options described by the Exporting Process
Reliability Statistics Options Template, or by anonymising the Reliability Statistics Options Template, or by anonymising the
Exporting Process address using a similar technique to that used to Exporting Process address using a similar technique to that used to
anonymise the IP addresses in the exported data. anonymize the IP addresses in the exported data.
Similarly, the Export Session Details Options Template and Message Similarly, the Export Session Details Options Template and Message
Details Options Template specified for the IPFIX File Format Details Options Template specified for the IPFIX File Format
[RFC5655] may contain the exportingProcessIPv4Address Information [RFC5655] may contain the exportingProcessIPv4Address Information
Element or the exportingProcessIPv6Address Information Element to Element or the exportingProcessIPv6Address Information Element to
identify an Exporting Process from which a flow record was received, identify an Exporting Process from which a flow record was received,
and the collectingProcessIPv4Address Information Element or the and the collectingProcessIPv4Address Information Element or the
collectingProcessIPv6Address Information Element to identify the collectingProcessIPv6Address Information Element to identify the
Collecting Process which received it. If the Exporting Process or Collecting Process which received it. If the Exporting Process or
Collecting Process address bears some relation to the IP addresses in Collecting Process address bears some relation to the IP addresses in
the data set (e.g., shares a network prefix with internal addresses) the data set (e.g., shares a network prefix with internal addresses)
and the IP addresses in the data set are anonymised in a structure- and the IP addresses in the data set are anonymized in a structure-
preserving way, then the Exporting Process or Collecting Process preserving way, then the Exporting Process or Collecting Process
address may be used to break the IP address anonymisation. Since address may be used to break the IP address anonymization. Since
these Options Templates are primarily intended for storing IPFIX these Options Templates are primarily intended for storing IPFIX
Transport Session data for auditing, replay, and testing purposes, it Transport Session data for auditing, replay, and testing purposes, it
is NOT RECOMMENDED that storage of anonymised data include these is NOT RECOMMENDED that storage of anonymized data include these
Options Templates in order to mitigate the risk of attack. Options Templates in order to mitigate the risk of attack.
The Message Details Options Template specified for the IPFIX File The Message Details Options Template specified for the IPFIX File
Format [RFC5655] also contains the collectionTimeMilliseconds Format [RFC5655] also contains the collectionTimeMilliseconds
Information Element. As with the Export Time Message Header field, Information Element. As with the Export Time Message Header field,
if the exported data set contains anonymised timestamp information, if the exported data set contains anonymized timestamp information,
and the collectionTimeMilliseconds Information Element in a given and the collectionTimeMilliseconds Information Element in a given
Message has some relationship to the anonymised timestamp Message has some relationship to the anonymized timestamp
information, then this relationship can be exploited to reverse the information, then this relationship can be exploited to reverse the
timestamp anonymisation. Since this Options Template is primarily timestamp anonymization. Since this Options Template is primarily
intended for storing IPFIX Transport Session data for auditing, intended for storing IPFIX Transport Session data for auditing,
replay, and testing purposes, it is NOT RECOMMENDED that storage of replay, and testing purposes, it is NOT RECOMMENDED that storage of
anonymised data include this Options Template in order to mitigate anonymized data include this Options Template in order to mitigate
the risk of attack. the risk of attack.
Since the Time Window Options Template specified for the IPFIX File Since the Time Window Options Template specified for the IPFIX File
Format [RFC5655] refers to the timestamps within the data set to Format [RFC5655] refers to the timestamps within the data set to
provide partial table of contents information for an IPFIX File, provide partial table of contents information for an IPFIX File,
Options described by this template SHOULD be written using the Options described by this template SHOULD be written using the
anonymised timestamps instead of the original ones. anonymized timestamps instead of the original ones.
7.2.5. Special-Use Address Space Considerations 7.2.5. Special-Use Address Space Considerations
When anonymising data for transport or storage using IPFIX containing When anonymising data for transport or storage using IPFIX containing
anonymised IP addresses, and the analysis purpose permits doing so, anonymized IP addresses, and the analysis purpose permits doing so,
it is RECOMMENDED to filter out or leave unanonymised data containing it is RECOMMENDED to filter out or leave unanonymized data containing
the special-use IPv4 addresses enumerated in [RFC5735] or the the special-use IPv4 addresses enumerated in [RFC5735] or the
special-use IPv6 addresses enumerated in [RFC5156]. Data containing special-use IPv6 addresses enumerated in [RFC5156]. Data containing
these addresses (e.g. 0.0.0.0 and 169.254.0.0/16 for link-local these addresses (e.g. 0.0.0.0 and 169.254.0.0/16 for link-local
autoconfiguration in IPv4 space) are often associated with specific, autoconfiguration in IPv4 space) are often associated with specific,
well-known behavioral patterns. Detection of these patterns in well-known behavioral patterns. Detection of these patterns in
anonymised data can lead to deanonymisation of these special-use anonymized data can lead to deanonymization of these special-use
addresses, which increases the chance of a complete reversal of addresses, which increases the chance of a complete reversal of
anonymisation by an attacker, especially of prefix-preserving anonymization by an attacker, especially of prefix-preserving
techniques. techniques.
7.2.6. Protecting Out-of-Band Configuration and Management Data 7.2.6. Protecting Out-of-Band Configuration and Management Data
Special care should be taken when exporting or sharing anonymised Special care should be taken when exporting or sharing anonymized
data to avoid information leakage via the configuration or management data to avoid information leakage via the configuration or management
planes of the IPFIX Device containing the Exporting Process or the planes of the IPFIX Device containing the Exporting Process or the
File Writer. For example, adding noise to counters is useless if the File Writer. For example, adding noise to counters is useless if the
receiver can deduce the values in the counters from SNMP information, receiver can deduce the values in the counters from SNMP information,
and concealing the network under test is similarly useless if such and concealing the network under test is similarly useless if such
information is available in a configuration document. As the information is available in a configuration document. As the
specifics of these concerns are largely implementation- and specifics of these concerns are largely implementation- and
deployment-dependent, specific mitigation is out of scope for this deployment-dependent, specific mitigation is out of scope for this
draft. The general ground rule is that information of similar type draft. The general ground rule is that information of similar type
to that anonymised SHOULD NOT be made available to the receiver by to that anonymized SHOULD NOT be made available to the receiver by
any means, whether in the Data Records, in IPFIX protocol structures any means, whether in the Data Records, in IPFIX protocol structures
such as Message Headers, or out-of-band. such as Message Headers, or out-of-band.
8. Examples 8. Examples
In this example, consider the export or storage of an anonymised IPv4 In this example, consider the export or storage of an anonymized IPv4
data set from a single network described by a simple template data set from a single network described by a simple template
containing a timestamp in seconds, a five-tuple, and packet and octet containing a timestamp in seconds, a five-tuple, and packet and octet
counters. The template describing each record in this data set is counters. The template describing each record in this data set is
shown in figure Figure 4. shown in figure Figure 4.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Set ID = 2 | Length = 40 | | Set ID = 2 | Length = 40 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
skipping to change at page 33, line 31 skipping to change at page 35, line 31
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| packetDeltaCount 2 | Field Length = 4 | |0| packetDeltaCount 2 | Field Length = 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| octetDeltaCount 1 | Field Length = 4 | |0| octetDeltaCount 1 | Field Length = 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| protocolIdentifier 4 | Field Length = 1 | |0| protocolIdentifier 4 | Field Length = 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 4: Example Flow Template Figure 4: Example Flow Template
Suppose that this data set is anonymised according to the following Suppose that this data set is anonymized according to the following
policy: policy:
o IP addresses within the network are protected by reverse o IP addresses within the network are protected by reverse
truncation. truncation.
o IP addresses outside the network are protected by prefix- o IP addresses outside the network are protected by prefix-
preserving anonymisation. preserving anonymization.
o Octet counts are exported using degraded precision in order to o Octet counts are exported using degraded precision in order to
provide minimal protection against fingerprinting attacks. provide minimal protection against fingerprinting attacks.
o All other fields are exported unanonymised. o All other fields are exported unanonymized.
In order to export anonymisation records for this template and In order to export anonymization records for this template and
policy, first, the Anonymisation Options Template shown in figure policy, first, the Anonymization Options Template shown in figure
Figure 5 is exported. For this example, the optional Figure 5 is exported. For this example, the optional
privateEnterpriseNumber and informationElementIndex Information privateEnterpriseNumber and informationElementIndex Information
Elements are omitted, because they are not used. Elements are omitted, because they are not used.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Set ID = 3 | Length = 26 | | Set ID = 3 | Length = 26 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template ID = 257 | Field Count = 4 | | Template ID = 257 | Field Count = 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Scope Field Count = 2 |0| templateID 145 | | Scope Field Count = 2 |0| templateID 145 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Field Length = 2 |0| informationElementId 303 | | Field Length = 2 |0| informationElementId 303 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Field Length = 2 |0| anonymisationFlags TBD1 | | Field Length = 2 |0| anonymizationFlags TBD1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Field Length = 2 |0| anonymisationTechnique TBD2 | | Field Length = 2 |0| anonymizationTechnique TBD2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Field Length = 2 | | Field Length = 2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 5: Example Anonymisation Options Template Figure 5: Example Anonymization Options Template
Following the Anonymisation Options Template comes a Data Set Following the Anonymization Options Template comes a Data Set
containing Anonymisation Records. This data set has an entry for containing Anonymization Records. This data set has an entry for
each Information Element Specifier in Template 256 describing the each Information Element Specifier in Template 256 describing the
flow records. This Data Set is shown in figure Figure 6. Note that flow records. This Data Set is shown in figure Figure 6. Note that
sourceIPv4Address and destinationIPv4Address have the Perimeter sourceIPv4Address and destinationIPv4Address have the Perimeter
Anonymisation (0x0004) flag set in anonymisationFlags, meaning that Anonymization (0x0004) flag set in anonymizationFlags, meaning that
source address should be treated as network-external, and the source address should be treated as network-external, and the
destination address as network-internal. destination address as network-internal.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Set ID = 257 | Length = 68 | | Set ID = 257 | Length = 68 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | flowStartSeconds IE 150 | | Template 256 | flowStartSeconds IE 150 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| no flags 0x0000 | Not Anonymised 1 | | no flags 0x0000 | Not Anonymized 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | sourceIPv4Address IE 8 | | Template 256 | sourceIPv4Address IE 8 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Perimeter, Session SC 0x0005 | Structured Permutation 6 | | Perimeter, Session SC 0x0005 | Structured Permutation 6 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | destinationIPv4Address IE 12 | | Template 256 | destinationIPv4Address IE 12 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Perimeter, Stable 0x0007 | Reverse Truncation 7 | | Perimeter, Stable 0x0007 | Reverse Truncation 7 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | sourceTransportPort IE 7 | | Template 256 | sourceTransportPort IE 7 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| no flags 0x0000 | Not Anonymised 1 | | no flags 0x0000 | Not Anonymized 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | dest.TransportPort IE 11 | | Template 256 | dest.TransportPort IE 11 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| no flags 0x0000 | Not Anonymised 1 | | no flags 0x0000 | Not Anonymized 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | packetDeltaCount IE 2 | | Template 256 | packetDeltaCount IE 2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| no flags 0x0000 | Not Anonymised 1 | | no flags 0x0000 | Not Anonymized 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | octetDeltaCount IE 1 | | Template 256 | octetDeltaCount IE 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Stable 0x0003 | Precision Degradation 2 | | Stable 0x0003 | Precision Degradation 2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Template 256 | protocolIdentifier IE 4 | | Template 256 | protocolIdentifier IE 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| no flags 0x0000 | Not Anonymised 1 | | no flags 0x0000 | Not Anonymized 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 6: Example Anonymisation Records Figure 6: Example Anonymization Records
Following the Anonymisation Records come the data sets containing the Following the Anonymization Records come the data sets containing the
anonymised data, exported according to the template in figure anonymized data, exported according to the template in figure
Figure 4. Bringing it all together, consider an IPFIX Message Figure 4. Bringing it all together, consider an IPFIX Message
containing three real data records and the necessary templates to containing three real data records and the necessary templates to
export them, shown in Figure 7. (Note that the scale of this message export them, shown in Figure 7. (Note that the scale of this message
is 8-bytes per line, for compactness; lines of dots '. . . . . ' is 8-bytes per line, for compactness; lines of dots '. . . . . '
represent shifting of the example bit structure for clarity.) represent shifting of the example bit structure for clarity.)
1 2 3 4 5 6 1 2 3 4 5 6
0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0x000a | length 135 | export time 1271227717 | msg | 0x000a | length 135 | export time 1271227717 | msg
| sequence 0 | domain 1 | hdr | sequence 0 | domain 1 | hdr
skipping to change at page 36, line 30 skipping to change at page 38, line 30
| packets 60 | bytes 2896 | | packets 60 | bytes 2896 |
| prt 6 | . . . . . . . . . . . . . . . . . . . . . . . . . . . | prt 6 | . . . . . . . . . . . . . . . . . . . . . . . . . . .
| time 1271227683 | sip 198.51.100.7 | | time 1271227683 | sip 198.51.100.7 |
| dip 203.0.113.9 | sp 5092 | dp 80 | | dip 203.0.113.9 | sp 5092 | dp 80 |
| packets 44 | bytes 2037 | | packets 44 | bytes 2037 |
| prt 6 | | prt 6 |
+---------+ +---------+
Figure 7: Example Real Message Figure 7: Example Real Message
The corresponding anonymised message is then shown in Figure 8. The The corresponding anonymized message is then shown in Figure 8. The
options template set describing Anonymisation Records and the options template set describing Anonymization Records and the
Anonymisation Records themselves are added; IP addresses and byte Anonymization Records themselves are added; IP addresses and byte
counts are anonymised as declared. counts are anonymized as declared.
1 2 3 4 5 6 1 2 3 4 5 6
0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0x000a | length 233 | export time 1271227717 | msg | 0x000a | length 233 | export time 1271227717 | msg
| sequence 0 | domain 1 | hdr | sequence 0 | domain 1 | hdr
| SetID 2 | length 40 | tid 256 | fields 8 | tmpl | SetID 2 | length 40 | tid 256 | fields 8 | tmpl
| IE 150 | length 4 | IE 8 | length 4 | set | IE 150 | length 4 | IE 8 | length 4 | set
| IE 12 | length 4 | IE 7 | length 2 | | IE 12 | length 4 | IE 7 | length 2 |
| IE 11 | length 2 | IE 2 | length 4 | | IE 11 | length 2 | IE 2 | length 4 |
skipping to change at page 37, line 42 skipping to change at page 39, line 42
| time 1271227682 | sip 0.0.0.7 | | time 1271227682 | sip 0.0.0.7 |
| dip 254.202.119.6 | sp 5091 | dp 80 | | dip 254.202.119.6 | sp 5091 | dp 80 |
| packets 60 | bytes 2900 | | packets 60 | bytes 2900 |
| prt 6 | . . . . . . . . . . . . . . . . . . . . . . . . . . . | prt 6 | . . . . . . . . . . . . . . . . . . . . . . . . . . .
| time 1271227683 | sip 0.0.0.7 | | time 1271227683 | sip 0.0.0.7 |
| dip 2.19.199.176 | sp 5092 | dp 80 | | dip 2.19.199.176 | sp 5092 | dp 80 |
| packets 60 | bytes 2000 | | packets 60 | bytes 2000 |
| prt 6 | | prt 6 |
+---------+ +---------+
Figure 8: Corresponding Anonymised Message Figure 8: Corresponding Anonymized Message
9. Security Considerations 9. Security Considerations
This document provides guidelines for exporting metadata about This document provides guidelines for exporting metadata about
anonymised data in IPFIX, or storing metadata about anonymised data anonymized data in IPFIX, or storing metadata about anonymized data
in IPFIX Files. It is not intended as a general statement on the in IPFIX Files. It is not intended as a general statement on the
applicability of specific flow data anonymisation techniques. applicability of specific flow data anonymization techniques.
Exporters or publishers of anonymised data must take care that the Exporters or publishers of anonymized data must take care that the
applied anonymisation technique is appropriate for the data source, applied anonymization technique is appropriate for the data source,
the purpose, and the risk of deanonymisation of a given application. the purpose, and the risk of deanonymization of a given application.
Research in anonymization techniques, and techniques for
deanonymization, is ongoing, and currently "safe" anonymization
techniques may be rendered unsafe by future developments.
We note specifically that anonymisation is not a replacement for We note specifically that anonymization is not a replacement for
encryption for confidentiality. It is only appropriate for encryption for confidentiality. It is only appropriate for
protecting identifying information in data to be used for purposes in protecting identifying information in data to be used for purposes in
which the protected data is irrelevant. Confidentiality in export is which the protected data is irrelevant. Confidentiality in export is
best served by using TLS or DTLS as in the Security Considerations best served by using TLS [RFC5246] or DTLS [RFC4347] as in the
section of [RFC5101], and in long-term storage by implementation- Security Considerations section of [RFC5101], and in long-term
specific protection applied as in the Security Considerations section storage by implementation-specific protection applied as in the
of [RFC5655]. Indeed, confidentiality and anonymisation are not Security Considerations section of [RFC5655]. Indeed,
mutually exclusive, as encryption for confidentiality may be applied confidentiality and anonymization are not mutually exclusive, as
to anonymised data export or storage, as well, when the anonymised encryption for confidentiality may be applied to anonymized data
data is not intended for public release. export or storage, as well, when the anonymized data is not intended
for public release.
When using pseudonymisation techniques that have a mutable mapping, We note as well that care should be taken even with well-anonymized
data, and anonymized data should still be treated as privacy-
sensitive. Anonymization reduces the risk of misuse, but is not a
complete solution to the problem of protecting end-user privacy in
network flow trace analysis.
When using pseudonymization techniques that have a mutable mapping,
there is an inherent tradeoff in the stability of the map between there is an inherent tradeoff in the stability of the map between
long-term comparability and security of the data set against long-term comparability and security of the data set against
deanonymisation. In general, deanonymisation attacks are more deanonymization. In general, deanonymization attacks are more
effective given more information, so the longer a given mapping is effective given more information, so the longer a given mapping is
valid, the more information can be applied to deanonymisation. The valid, the more information can be applied to deanonymization. The
specific details of this are technique-dependent and therefore out of specific details of this are technique-dependent and therefore out of
the scope of this document. the scope of this document.
When releasing anonymised data, publishers need to ensure that data When releasing anonymized data, publishers need to ensure that data
that could be used in deanonymisation is not leaked through the that could be used in deanonymization is not leaked through a side
export protocol; guidelines for addressing this risk are provided in channel. The entire workflow (hardware, software, operational
Section 7.2. policies and procedures, etc.) for handling anonymized data must be
evaluated for risk of data leakage. While most of these possible
side channels are out of scope for this document, guidelines for
reducing the risk of information leakage specific to the IPFIX export
protocol are provided in Section 7.2.
Note as well that the Security Considerations section of [RFC5101] Note as well that the Security Considerations section of [RFC5101]
applies as well to the export of anonymised data, and the Security applies as well to the export of anonymized data, and the Security
Considerations section of [RFC5655] to the storage of anonymised Considerations section of [RFC5655] to the storage of anonymized
data, or the publication of anonymised traces. data, or the publication of anonymized traces.
10. IANA Considerations 10. IANA Considerations
This document specifies the creation of several new IPFIX Information This document specifies the creation of several new IPFIX Information
Elements in the IPFIX Information Element registry located at Elements in the IPFIX Information Element registry located at
http://www.iana.org/assignments/ipfix, as defined in Section 6.2 http://www.iana.org/assignments/ipfix, as defined in Section 6.2
above. IANA has assigned the following Information Element numbers above. IANA has assigned the following Information Element numbers
for their respective Information Elements as specified below: for their respective Information Elements as specified below:
o Information Element number TBD1 for the anonymisationFlags o Information Element number TBD1 for the anonymizationFlags
Information Element. Information Element.
o Information Element number TBD2 for the anonymisationTechnique o Information Element number TBD2 for the anonymizationTechnique
Information Element. Information Element.
o Information Element number TBD3 for the informationElementIndex o Information Element number TBD3 for the informationElementIndex
Information Element. Information Element.
[NOTE for IANA: The text TBDn should be replaced with the respective [NOTE for IANA: The text TBDn should be replaced with the respective
assigned Information Element numbers where they appear in this assigned Information Element numbers where they appear in this
document.] document. Information Element numbers should be assigned outside the
NetFlow V9 compatibility range, as these Information Elements are not
supported by NetFlow V9.]
11. Acknowledgments 11. Acknowledgments
We thank Paul Aitken and John McHugh for their comments and insight, We thank Paul Aitken and John McHugh for their comments and insight,
and Carsten Schmoll, Benoit Claise, Lothar Braun, and Dan Romascanu and Carsten Schmoll, Benoit Claise, Lothar Braun, Dan Romascanu,
for their reviews. Special thanks to the ICT-PRISM project for its Stewart Bryant, and Sean Turner for their reviews. Special thanks to
material support of this work. the FP7 PRISM and DEMONS projects for their material support of this
work.
12. References 12. References
12.1. Normative References 12.1. Normative References
[RFC5101] Claise, B., "Specification of the IP Flow Information [RFC5101] Claise, B., "Specification of the IP Flow Information
Export (IPFIX) Protocol for the Exchange of IP Traffic Export (IPFIX) Protocol for the Exchange of IP Traffic
Flow Information", RFC 5101, January 2008. Flow Information", RFC 5101, January 2008.
[RFC5102] Quittek, J., Bryant, S., Claise, B., Aitken, P., and J. [RFC5102] Quittek, J., Bryant, S., Claise, B., Aitken, P., and J.
skipping to change at page 40, line 18 skipping to change at page 42, line 31
"Architecture for IP Flow Information Export", RFC 5470, "Architecture for IP Flow Information Export", RFC 5470,
March 2009. March 2009.
[RFC5472] Zseby, T., Boschi, E., Brownlee, N., and B. Claise, "IP [RFC5472] Zseby, T., Boschi, E., Brownlee, N., and B. Claise, "IP
Flow Information Export (IPFIX) Applicability", RFC 5472, Flow Information Export (IPFIX) Applicability", RFC 5472,
March 2009. March 2009.
[I-D.ietf-ipfix-mediators-framework] [I-D.ietf-ipfix-mediators-framework]
Kobayashi, A., Claise, B., Muenz, G., and K. Ishibashi, Kobayashi, A., Claise, B., Muenz, G., and K. Ishibashi,
"IPFIX Mediation: Framework", "IPFIX Mediation: Framework",
draft-ietf-ipfix-mediators-framework-08 (work in draft-ietf-ipfix-mediators-framework-09 (work in
progress), August 2010. progress), October 2010.
[I-D.ietf-ipfix-export-per-sctp-stream] [I-D.ietf-ipfix-export-per-sctp-stream]
Claise, B., Aitken, P., Johnson, A., and G. Muenz, "IPFIX Claise, B., Aitken, P., Johnson, A., and G. Muenz, "IPFIX
Export per SCTP Stream", Export per SCTP Stream",
draft-ietf-ipfix-export-per-sctp-stream-08 (work in draft-ietf-ipfix-export-per-sctp-stream-08 (work in
progress), May 2010. progress), May 2010.
[RFC5153] Boschi, E., Mark, L., Quittek, J., Stiemerling, M., and P. [RFC5153] Boschi, E., Mark, L., Quittek, J., Stiemerling, M., and P.
Aitken, "IP Flow Information Export (IPFIX) Implementation Aitken, "IP Flow Information Export (IPFIX) Implementation
Guidelines", RFC 5153, April 2008. Guidelines", RFC 5153, April 2008.
[RFC3917] Quittek, J., Zseby, T., Claise, B., and S. Zander, [RFC3917] Quittek, J., Zseby, T., Claise, B., and S. Zander,
"Requirements for IP Flow Information Export (IPFIX)", "Requirements for IP Flow Information Export (IPFIX)",
RFC 3917, October 2004. RFC 3917, October 2004.
[RFC4291] Hinden, R. and S. Deering, "IP Version 6 Addressing [RFC4291] Hinden, R. and S. Deering, "IP Version 6 Addressing
Architecture", RFC 4291, February 2006. Architecture", RFC 4291, February 2006.
[RFC4347] Rescorla, E. and N. Modadugu, "Datagram Transport Layer
Security", RFC 4347, April 2006.
[RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security
(TLS) Protocol Version 1.2", RFC 5246, August 2008.
[Bur10] Burkhart, M., Schatzmann, D., Trammell, B., and E. Boschi,
"The Role of Network Trace Anonymization Under Attack",
ACM Computer Communications Review, vol. 40, no. 1, pp.
6-11, January 2010.
[Mur07] Murdoch, S. and P. Zielinski, "Sampled Traffic Analysis by
Internet-Exchange-Level Adversaries", Proceedings of the
7th Workshop on Privacy Enhancing Technologies, Ottawa,
Canada., June 2007.
Authors' Addresses Authors' Addresses
Elisa Boschi Elisa Boschi
Swiss Federal Institute of Technology Zurich Swiss Federal Institute of Technology Zurich
Gloriastrasse 35 Gloriastrasse 35
8092 Zurich 8092 Zurich
Switzerland Switzerland
Email: boschie@tik.ee.ethz.ch Email: boschie@tik.ee.ethz.ch
Brian Trammell Brian Trammell
Swiss Federal Institute of Technology Zurich Swiss Federal Institute of Technology Zurich
Gloriastrasse 35 Gloriastrasse 35
8092 Zurich 8092 Zurich
Switzerland Switzerland
Phone: +41 44 632 70 13 Phone: +41 44 632 70 13
Email: trammell@tik.ee.ethz.ch Email: trammell@tik.ee.ethz.ch
 End of changes. 285 change blocks. 
484 lines changed or deleted 606 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/