IPFIX Working Group E. Boschi Internet-Draft B. Trammell Intended status: Experimental ETH Zurich Expires:AprilJuly 23, 2011October 20, 2010January 19, 2011 IP FlowAnonymisationAnonymization Supportdraft-ietf-ipfix-anon-05.txtdraft-ietf-ipfix-anon-06.txt Abstract This document describesanonymisationanonymization techniques for IP flow data and the export ofanonymisedanonymized data using the IPFIX protocol. It categorizes commonanonymisationanonymization schemes and defines the parameters needed to describe them. It provides guidelines for the implementation ofanonymisedanonymized data export and storage over IPFIX, and describes an information model and Options-based method foranonymisationanonymization metadata export within the IPFIX protocol or storage in IPFIX Files. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire onAprilJuly 23, 2011. Copyright Notice Copyright (c)20102011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1. IPFIX Protocol Overview . . . . . . . . . . . . . . . . . 4 1.2. IPFIX Documents Overview . . . . . . . . . . . . . . . . . 5 1.3.AnonymisationAnonymization within the IPFIX Architecture . . . . . . . 5 1.4. Supporting Experimentation with Anonymization . . . . . . 6 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.CategorisationCategorization ofAnonymisationAnonymization Techniques . . . . . . . . . .67 4.AnonymisationAnonymization of IP Flow Data . . . . . . . . . . . . . . . . 8 4.1. IP AddressAnonymisationAnonymization . . . . . . . . . . . . . . . . .910 4.1.1. Truncation . . . . . . . . . . . . . . . . . . . . . .911 4.1.2. Reverse Truncation . . . . . . . . . . . . . . . . . .1011 4.1.3. Permutation . . . . . . . . . . . . . . . . . . . . .1011 4.1.4. Prefix-preservingPseudonymisationPseudonymization . . . . . . . . . .1112 4.2. MAC AddressAnonymisationAnonymization . . . . . . . . . . . . . . . .1112 4.2.1. Truncation . . . . . . . . . . . . . . . . . . . . . .1213 4.2.2. Reverse Truncation . . . . . . . . . . . . . . . . . .1213 4.2.3. Permutation . . . . . . . . . . . . . . . . . . . . .1314 4.2.4. StructuredPseudonymisationPseudonymization . . . . . . . . . . . . .1314 4.3. TimestampAnonymisationAnonymization . . . . . . . . . . . . . . . . .1315 4.3.1. Precision Degradation . . . . . . . . . . . . . . . .1415 4.3.2. Enumeration . . . . . . . . . . . . . . . . . . . . .1416 4.3.3. Random Shifts . . . . . . . . . . . . . . . . . . . .1516 4.4. CounterAnonymisationAnonymization . . . . . . . . . . . . . . . . . .1516 4.4.1. Precision Degradation . . . . . . . . . . . . . . . .1517 4.4.2. Binning . . . . . . . . . . . . . . . . . . . . . . .1517 4.4.3. Random Noise Addition . . . . . . . . . . . . . . . .1617 4.5.AnonymisationAnonymization of Other Flow Fields . . . . . . . . . . . .1617 4.5.1. Binning . . . . . . . . . . . . . . . . . . . . . . .1618 4.5.2. Permutation . . . . . . . . . . . . . . . . . . . . .1718 5. Parameters for the Description ofAnonymisationAnonymization Techniques . .1718 5.1. Stability . . . . . . . . . . . . . . . . . . . . . . . .1719 5.2. Truncation Length . . . . . . . . . . . . . . . . . . . .1819 5.3. Bin Map . . . . . . . . . . . . . . . . . . . . . . . . .1820 5.4. Permutation . . . . . . . . . . . . . . . . . . . . . . .1820 5.5. Shift Amount . . . . . . . . . . . . . . . . . . . . . . .1920 6.AnonymisationAnonymization Export Support in IPFIX . . . . . . . . . . . .1920 6.1.AnonymisationAnonymization Records and theAnonymisationAnonymization Options Template . . . . . . . . . . . . . . . . . . . . . . . . .1921 6.2. Recommended Information Elements forAnonymisationAnonymization Metadata . . . . . . . . . . . . . . . . . . . . . . . . .2123 6.2.1. informationElementIndex . . . . . . . . . . . . . . .2123 6.2.2.anonymisationTechniqueanonymizationTechnique . . . . . . . . . . . . . . . .2223 6.2.3.anonymisationFlagsanonymizationFlags . . . . . . . . . . . . . . . . . .2325 7. ApplyingAnonymisationAnonymization Techniques to IPFIX Export and Storage . . . . . . . . . . . . . . . . . . . . . . . . . . .2527 7.1. Arrangement of Processes in IPFIXAnonymisationAnonymization . . . . .2628 7.2. IPFIX-SpecificAnonymisationAnonymization Guidelines . . . . . . . . .2830 7.2.1. Appropriate Use of Information Elements forAnonymisedAnonymized Data . . . . . . . . . . . . . . . . . . .2830 7.2.2. Export of Perimeter-BasedAnonymisationAnonymization Policies . . .2931 7.2.3.AnonymisationAnonymization of Header Data . . . . . . . . . . . . .3032 7.2.4.AnonymisationAnonymization of Options Data . . . . . . . . . . . .3032 7.2.5. Special-Use Address Space Considerations . . . . . . .3234 7.2.6. Protecting Out-of-Band Configuration and Management Data . . . . . . . . . . . . . . . . . . .3234 8. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . .3234 9. Security Considerations . . . . . . . . . . . . . . . . . . .3739 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . .3841 11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .3941 12. References . . . . . . . . . . . . . . . . . . . . . . . . . .3941 12.1. Normative References . . . . . . . . . . . . . . . . . . .3941 12.2. Informative References . . . . . . . . . . . . . . . . . .4042 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . .4043 1. Introduction Thestandardisationstandardization of an IP flow information export protocol [RFC5101] and associated representations removes a technical barrier to the sharing of IP flow data across organizational boundaries and with network operations, security, and research communities for a wide variety of purposes. However, with wider dissemination comes greater risks to the privacy of the users of networks under measurement, and to the security of those networks. While it is not a complete solution to the issues posed by distribution of IP flow information,anonymisationanonymization (i.e., the deletion or transformation of information that is considered sensitive and could be used to reveal the identity of subjects involved in a communication) is an important tool for the protection of privacy within network measurement infrastructures. This document presents a mechanism for representinganonymisedanonymized data within IPFIX and guidelines for using it. It is not intended as a general statement on the applicability of specific flow data anonymization techniques to specific situations, or as a recommendation of any particular application of anonymization to flow data export. Exporters or publishers of anonymized data must take care that the applied anonymization technique is appropriate for the data source, the purpose, and the risk of deanonymization of a given application. It begins with a categorization ofanonymisationanonymization techniques. It then describes applicability of each technique to commonlyanonymisableanonymizable fields of IP flow data, organized by information element data type and semantics as in [RFC5102]; enumerates the parameters required by each of the applicableanonymisationanonymization techniques; and provides guidelines for the use of each of these techniques in accordance with current best practices in data protection. Finally, it specifies a mechanism for exportinganonymisedanonymized data and bindinganonymisationanonymization metadata to Templates and Options Templates using IPFIX Options. 1.1. IPFIX Protocol Overview In the IPFIX protocol, { type, length, value } tuples are expressed in Templates containing { type, length } pairs, specifying which { value } fields are present in data records conforming to the Template, giving great flexibility as to what data is transmitted. Since Templates are sent very infrequently compared with Data Records, this results in significant bandwidth savings. Various different data formats may be transmitted simply by sending new Templates specifying the { type, length } pairs for the new data format. See [RFC5101] for more information. The IPFIX information model [RFC5102] defines a large number of standard Information Elements which provide the necessary { type } information for Templates. The use of standard elements enables interoperability among different vendors' implementations. Additionally, non-standard enterprise-specific elements may be defined for private use. 1.2. IPFIX Documents Overview "Specification of the IPFIX Protocol for the Exchange of IP Traffic Flow Information" [RFC5101] and its associated documents define the IPFIX Protocol, which provides network engineers and administrators with access to IP traffic flow information. "Architecture for IP Flow Information Export" [RFC5470] defines the architecture for the export of measured IP flow information out of an IPFIX Exporting Process to an IPFIX Collecting Process, and the basic terminology used to describe the elements of this architecture, per the requirements defined in "Requirements for IP Flow Information Export" [RFC3917]. The IPFIX Protocol document [RFC5101] then covers the details of the method for transporting IPFIX Data Records and Templates via a congestion-aware transport protocol from an IPFIX Exporting Process to an IPFIX Collecting Process. "Information Model for IP Flow Information Export" [RFC5102] describes the Information Elements used by IPFIX, including details on Information Element naming, numbering, and data type encoding. Finally, "IPFIX Applicability" [RFC5472] describes the various applications of the IPFIX protocol and their use of information exported via IPFIX, and relates the IPFIX architecture to other measurement architectures and frameworks. Additionally, "Specification of the IPFIX File Format" [RFC5655] describes a file format based upon the IPFIX Protocol for the storage of flow data. This document references the Protocol and Architecture documents for terminology, and extends the IPFIX Information Model to provide new Information Elements foranonymisationanonymization metadata. Theanonymisationanonymization techniques described herein are equally applicable to the IPFIX Protocol and data stored in IPFIX Files. 1.3.AnonymisationAnonymization within the IPFIX Architecture According to [RFC5470], IPFIX Messageanonymisationanonymization is optionally performed as the final operation before handing the Message to the transport protocol for export. While no provision is made in the architecture foranonymisationanonymization metadata as in Section 6, this arrangement does allow for the rewriting necessary for comprehensiveanonymisationanonymization of IPFIX export as in Section 7. The development of the IPFIX Mediation [I-D.ietf-ipfix-mediators-framework] framework and the IPFIX File Format [RFC5655] expand upon this initial architectural allowance foranonymisationanonymization by adding to the list of places thatanonymisationanonymization may be applied. The former specifies IPFIX Mediators, which rewrite existing IPFIX Messages, and the latter specifies a method for storage of IPFIX data in files. More detail on the applicable architectural arrangements foranonymisationanonymization can be found in Section 7.1 1.4. Supporting Experimentation with Anonymization The intended status of this document is Experimental, reflecting the experimental nature of anonymization export support. Research on network trace anonymization techniques and attacks against them is ongoing. Indeed, there is increasing evidence that anonymization applied to network trace or flow data its own is insufficient for many data protection applications as in [Bur10]. Therefore, this document explicitly does not recommend any particular technique or implementation thereof. The intention of this document is to provide a common basis for interoperable exchange of anonymized data, furthering research in this area, both on anonymization techniques themselves as well as to the application of anonymized data to network measurement. To that end, the classification in Section 3 and anonymization export support in Section 6 can be used to describe and export information even about data anonymized using techniques that are unacceptably weak for general application to production data sets on their own. While the specification herein is designed to be implementation- and technique-independent, open research in this area may necessitate future updates to the specification. Assuming the future successful application of this specification to anonymized data publication and exchange, it may be brought back to the IPFIX working group for further development and publication on the standards track. 2. Terminology Terms used in this document that are defined in the Terminology section of the IPFIX Protocol [RFC5101] document are to be interpreted as defined there. In addition, this document defines the following terms:AnonymisationAnonymization Record: A record, defined by theAnonymisationAnonymization Options Template in section Section 6.1, that defines the properties of theanonymisationanonymization applied to a single Information Element within a single Template or Options Template.AnonymisedAnonymized Data Record: A Data Record within a Data Set containing at least one Information Element withanonymisedanonymized values. The Information Element(s) within the Template or Options Template describing this Data Record SHOULD have a correspondingAnonymisationAnonymization Record. IntermediateAnonymisationAnonymization Process: An intermediate process which takes Data Records and and transforms them intoAnonymisedAnonymized Data Records. Note that there is an explicit difference in this document between a "Data Set" (which is defined as in [RFC5101]) and a "data set". When in lower case, this term refers to any collection of data (usually, within the context of this document, flow or packet data) which may contain identifying information and is therefore subject toanonymisation.anonymization. Note also that when the term Template is used in this document, unless otherwise noted, it applies both to Templates and Options Templates as defined in [RFC5101]. Specifically,AnonymisationAnonymization Records may apply to both Templates and Options Templates. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 3.CategorisationCategorization ofAnonymisationAnonymization TechniquesAnonymisation modifiesAnonymization, as described by this document, is the modification of a data set in order to protect the identity of the people or entities described by the data set from disclosure. With respect to network traffic data,anonymisationanonymization generally attempts to preserve some set of properties of the network traffic useful for a given application or applications, while ensuring the data cannot be traced back to the specific networks, hosts, or users generating the traffic.AnonymisationAnonymization may be broadly classified according to two properties: recoverability and countability. Allanonymisationanonymization techniques map the real space of identifiers or values into a separate,anonymisedanonymized space, according to some function. A technique is said to be recoverable when the function used is invertible or can otherwise be reversed and a real identifier can be recovered from a given replacement identifier. Techniques wherein the function used can only be reversed using additional information, such as an encryption key, or knowledge of injected traffic within the data set; "recoverability" as used within this categorization does not refer to recoverability under attack. Countability compares the dimension of theanonymisedanonymized space (N) to the dimension of the real space (M), and denotes how the count of unique values is preserved by theanonymisationanonymization function. If theanonymisedanonymized space is smaller than the real space, then the function is said togeneralisegeneralize the input, mapping more than one input point to each anonymous value (e.g., as with aggregation). By definition,generalisationgeneralization is not recoverable. If the dimensions of theanonymisedanonymized and real spaces are the same, such that the count of unique values is preserved, then the function is said to be a direct substitution function. If the dimension of theanonymisedanonymized space is larger, such that each real value maps to a set ofanonymisedanonymized values, then the function is said to be a set substitution function. Note that with set substitution functions, the sets ofanonymisedanonymized values are not necessarily disjoint. Either direct or set substitution functions are said to be one-way if there exists no non-brute force method for recovering the real data point from ananonymisedanonymized one in isolation (i.e., if the only way to recover the data point is to attack theanonymisedanonymized data set as a whole, e.g. through fingerprinting or data injection). This classification issummarisedsummarized in the table below. +------------------------+-----------------+------------------------+ | Recoverability / | Recoverable | Non-recoverable | | Countability | | | +------------------------+-----------------+------------------------+ | N < M | N.A. |GeneralisationGeneralization | | N = M | Direct | One-way Direct | | | Substitution | Substitution | | N > M | Set | One-way Set | | | Substitution | Substitution | +------------------------+-----------------+------------------------+ 4.AnonymisationAnonymization of IP Flow DataDueIn anonymizing IP flow data as treated by this document, the goal is generally two-way address untraceability: to remove therestricted semantics ofability to assert that endpoint X contacted endpoint Y at time T. Address untraceability is important as IP addresses are the most suitable field in IP flowdata, thererecords to identify real-world entities. Each IP address isa relatively limited set of specific anonymisation techniques availableassociated with an interface onflow data, though each falls into the broad categories above. Each type of fielda network host, and can potentially be identified with a single user. Additionally, IP addresses are structured identifiers; that is, partial IP address prefixes maycommonly appear in abe used to identify networks just as full IP addresses identify hosts. This leads IP flow data anonymization to be concerned first and foremost with IP address anonymization. Any form of aggregation which combines flows from multiple endpoints into a single record (e.g., aggregation by subnetwork, aggregation removing addressing completely) mayhave its own applicable specific techniques. While anonymisationalso provide address untraceability; however, anonymization by aggregation isgenerallyout of scope for this document. Additionally of potential interest in this problem space but out of scope are anonymization techniques which are applied over multiple fields or multiple records in a way which introduces dependencies among anonymized fields or records. This document is concerned solely with anonymization techniques applied at the resolution of single fields within a flowrecord,record. Even so, attacks againstanonymisationthese anonymization techniques use entire flows and relationships between hosts and flows within a given data set. Therefore, fields which may not necessarily be identifying by themselves may beanonymisedanonymized in order to increase the anonymity of the data set as a whole.Of allDue to thefields in anrestricted semantics of IP flowrecord, IP addresses aredata, there is a relatively limited set of specific anonymization techniques available on flow data, though each falls into themost likely to be used to directly identify entitiesbroad categories discussed in thereal world.previous section. EachIP address is associated with an interface on a network host, and can potentially be identified with a single user. Additionally, IP addresses are structured identifiers;type of field thatis, partial IP address prefixesmaybe used to identify networks just as full IP addresses identify hosts. This makes anonymisation ofcommonly appear in a flow record may have its own applicable specific techniques. As with IPaddresses particularly important.addresses, MAC addresses uniquely identify devices on the network; while they are not often available in traffic data collected at Layer 3, and cannot be used to locate devices within the network, some traces may contain sub-IP data including MAC address data. Hardware addresses may be mappable to device serial numbers, and to the entities or individuals who purchased the devices, when combined with external databases. MAC addresses are also often used in constructing IPv6 addresses (see section 2.5.1 of [RFC4291]), and as such may be used to reconstruct the low-order bits ofanonymisedanonymized IPv6 addresses in certain circumstances. Therefore, MAC addressanonymisationanonymization is also important. Port numbers identify abstract entities (applications) as opposed to real-world entities, but they can be used to classify hosts and user behavior. Passive port fingerprinting, both of well-known and ephemeral ports, can be used to determine the operating system running on a host. Relative data volumes by port can also be used to determine the host's function (workstation, web server, etc.); this information can be used to identify hosts and users. While not identifiers in and of themselves, timestamps and counters can reveal the behavior of the hosts and users on a network. Any given network activity is recognizable by a pattern of relative time differences and data volumes in the associated sequence of flows, even without host address information. They can therefore be used to identify hosts and users. Timestamps and counters are also vulnerable to traffic injection attacks, where traffic with a known pattern is injected into a network under measurement, and this pattern is later identified in theanonymisedanonymized data set. The simplest and most extreme form ofanonymisation,anonymization, which can be applied to any field of a flow record, is black-markeranonymisation,anonymization, or complete deletion of a given field. Note that black-markeranonymisationanonymization is equivalent to simply not exporting the field(s) in question. While black-markeranonymisationanonymization completely protects the data in the deleted fields from the risk of disclosure, it also reduces the utility of theanonymisedanonymized data set as a whole. Techniques that retain some information while reducing (though not eliminating) the disclosure risk will be extensively discussed in the following sections; note that the techniques specifically applicable to IP addresses, timestamps, ports, and counters will be discussed in separate sections. 4.1. IP AddressAnonymisationAnonymization Since IP addresses are the most common identifiers within flow data that can be used to directly identify a person, organization, or host, most of the work on flow and trace dataanonymisationanonymization has gone into IP addressanonymisationanonymization techniques. Indeed, the aim of most attacks againstanonymisationanonymization is to recover the map fromanonymisedanonymized IP addresses to original IP addresses thereby identifying the identified hosts. There is therefore a wide range of IP addressanonymisationanonymization schemes that fit into the following categories. +------------------------------------+---------------------+ | Scheme | Action | +------------------------------------+---------------------+ | Truncation |GeneralisationGeneralization | | Reverse Truncation |GeneralisationGeneralization | | Permutation | Direct Substitution | | Prefix-preservingPseudonymisationPseudonymization | Direct Substitution | +------------------------------------+---------------------+ 4.1.1. Truncation Truncation removes "n" of the least significant bits from an IP address, replacing them with zeroes. In effect, it replaces a host address with a network address for some fixed netblock; for IPv4 addresses, 8-bit truncation corresponds to replacement with a /24 network address. Truncation is a non-reversiblegeneralisationgeneralization scheme. Note that while truncation is effective for making hosts non-identifiable, it preserves information which can be used to identify an organization, a geographic region, a country, or a continent. Truncation to an address length of 0 is equivalent to black-markeranonymisation.anonymization. Complete removal of IP address information is only recommended for analysis tasks which have no need to separate flow data by host or network; e.g. as a first stage to per-application (port) or time-series total volume analyses. 4.1.2. Reverse Truncation Reverse truncation removes "n" of the most significant bits from an IP address, replacing them with zeroes. Reverse truncation is a non- reversiblegeneralisationgeneralization scheme. Reverse truncation is effective for making networks unidentifiable, partially or completely removing information which can be used to identify an organization, a geographic region, a country, or a continent (or RIR region of responsibility). However, it may cause ambiguity when applied to data collected from more than one network, since it treats all the hosts with the same address on different networks as if they are the same host. It is not particularly useful when publishing data where the network of origin is known or can be easily guessed by virtue of the identity of the publisher. Like truncation, reverse truncation to an address length of 0 is equivalent to black-markeranonymisation.anonymization. 4.1.3. Permutation Permutation is a direct substitution technique, replacing each IP address with an address selected from the set of possible IP addresses, such that eachanonymisedanonymized address represents a unique original address. The selection function is often random, though it is not necessarily so. Permutation does not preserve any structural information about a network, but it does preserve the unique count of IP addresses. Any application that requires more structure than host-uniqueness will not be able to use permuted IP addresses.WhileThere are many variations of permutationideallyfunctions, each of which has tradeoffs in performance, security, and guarantees of non-collision; evaluating these tradeoffs is implementation independent. However, in general permutation functions applied to anonymization SHOULD be difficult to reverse without knowing the parameters (e.g., a secret key for HMAC). Given the relatively small space of IPv4 addresses in particular, hash functions applied without additional parameters could be reversed through brute force if the hash function is known, and SHOULD NOT be used as permutation functions. Permutation functions may guarantee noncollision (i.e., that eachanonymisedanonymized address represents a unique originaladdress, such requires significant state inaddress), but need not; however, theIntermediate Anonymisation Process. Therefore, permutation mayprobability of collision SHOULD beimplemented by hashing for performance reasons,low. We treat even permutations withhash functions that may have relatively smalllow but nonzero collisionprobabilities. Such techniques are still essentiallyprobability as direct substitutiontechniques, despite the nonzero error probability.nevertheless. Beyond these guidelines, recommendations for specific permutation functions are out of scope for this document. 4.1.4. Prefix-preservingPseudonymisationPseudonymization Prefix-preservingpseudonymisationpseudonymization is a direct substitution technique, like permutation but further restricted such that the structure of subnets is preserved at each level while anonymising IP addresses. If two real IP addresses match on a prefix of "n" bits, the twoanonymisedanonymized IP addresses will match on a prefix of "n" bits as well. This is useful when relationships among networks must be preserved for a given analysis task, but introduces structure into theanonymisedanonymized data which can be exploited in attacks against theanonymisationanonymization technique. Scanning in Internet background traffic can cause particular problems with this technique: if a scanner uses a predictable and known sequence of addresses, this information can be used to reverse the substitution. The low order portion of the address can be left unanonymized as a partial defense against this attack. 4.2. MAC AddressAnonymisationAnonymization Flow data containing sub-IP information can also contain identifying information in the form of the hardware (MAC) address. While MAC address information cannot be used to locate a node within a network, it can be used to directly uniquely identify a specific device. Vendors or organizations within the supply chain may then have the information necessary to identify the entity or individual that purchased the device. MAC address information is not as structured as IP address information. EUI-48 and EUI-64 MAC addresses contain an Organizational Unique Identifier (OUI) in the three most significant bytes of the address; this OUI additionally contains bits noting whether the address is locally or globally administered. Beyond this, there is no standard relationship among the OUIs assigned to a given vendor. Note that MAC address information also appear within IPv6 addresses, as the EAP-64 address, or EAP-48 address encoded as an EAP-64 address, is used as the least significant 64 bits of the IPv6 address in the case of link local addressing or stateless autoconfiguration; the considerations and techniques in this section may then apply to such IPv6 addresses as well. +-----------------------------+---------------------+ | Scheme | Action | +-----------------------------+---------------------+ | Truncation |GeneralisationGeneralization | | Reverse Truncation |GeneralisationGeneralization | | Permutation | Direct Substitution | | StructuredPseudonymisationPseudonymization | Direct Substitution | +-----------------------------+---------------------+ 4.2.1. Truncation Truncation removes "n" of the least significant bits from a MAC address, replacing them with zeroes. In effect, it retains bits of OUI, which identifies the manufacturer, while removing the least significant bits identifying the particular device. Truncation of 24 bits of an EAP-48 or 40 bits of an EAP-64 address zeroes out the device identifier while retaining the OUI. Truncation is effective for making device manufacturers partially or completely identifiable within a dataset while deleting unique host identifiers; this can be used to retain and aggregate MAC layer behavior by vendor. Truncation to an address length of 0 is equivalent to black-markeranonymisation.anonymization. 4.2.2. Reverse Truncation Reverse truncation removes "n" of the most significant bits from a MAC address, replacing them with zeroes. Reverse truncation is a non-reversiblegeneralisationgeneralization scheme. This has the effect of removing bits of the OUI, which identify manufacturers, before removing the least significant bits. Reverse truncation of 24 bits zeroes out the OUI. Reverse truncation is effective for making device manufacturers partially or completely unidentifiable within a dataset. However, it may cause ambiguity by introducing the possibility of truncated MAC address collision. Also note that the utility of removing manufacturer information is not particularly well-covered by the literature. Reverse truncation to an address length of 0 is equivalent to black- markeranonymisation.anonymization. 4.2.3. Permutation Permutation is a direct substitution technique, replacing each MAC address with an address selected from the set of possible MAC addresses, such that eachanonymisedanonymized address represents a unique original address. The selection function is often random, though it is not necessarily so. Permutation does not preserve any structural information about a network, but it does preserve the unique count of devices on the network. Any application that requires more structure than host-uniqueness will not be able to use permuted MAC addresses.WhileThere are many variations of permutationideallyfunctions, each of which has tradeoffs in performance, security, and guarantees of non-collision; evaluating these tradeoffs is implementation independent. However, in general permutation functions applied to anonymization SHOULD be difficult to reverse without knowing the parameters (e.g., a secret key for HMAC). While the EAP-48 space is larger than the IPv4 address space, hash functions applied without additional parameters could be reversed through brute force if the hash function is known, and SHOULD NOT be used as permutation functions. Permutation functions may guarantee noncollision (i.e., that eachanonymisedanonymized address represents a unique originaladdress, such requires significant state inaddress), but need not; however, theIntermediate Anonymisation Process. Therefore, permutation mayprobability of collision SHOULD beimplemented by hashing for performance reasons,low. We treat even permutations withhash functions that may have relatively smalllow but nonzero collisionprobabilities. Such techniques are still essentiallyprobability as direct substitutiontechniques, despite the nonzero error probability.nevertheless. Beyond these guidelines, recommendations for specific permutation functions are out of scope for this document. 4.2.4. StructuredPseudonymisationPseudonymization Structuredpseudonymisationpseudonymization for MAC addresses is a direct substitution technique, like permutation, but restricted such that the OUI (the most significant three bytes) is permuted separately from the node identifier, the remainder. This is useful when the uniqueness of OUIs must be preserved for a given analysis task, but introduces structure into theanonymisedanonymized data which can be exploited in attacks against theanonymisationanonymization technique. 4.3. TimestampAnonymisationAnonymization The particular time at which a flow began or ended is not particularly identifiable information, but it can be used as part of attacks against otheranonymisationanonymization techniques or for userprofiling. Precise timestampsprofiling, e.g. as in [Mur07]. Timestamps can be used ininjected-traffic fingerprintingtraffic injection attacks, which use known information about a set of traffic generated or otherwise known by an attacker to recover mappings of otheranonymisedanonymized fields, as well as to identify certain activity by response delay and size fingerprinting, which compares response sizes and inter-flow times inanonymisedanonymized data to known values.Therefore, timestamp information may be anonymised in orderNote that these attacks have been shown toensure the protection ofbe relatively robust against timestamp anonymization techniques (see [Bur10]), so theentire data set.techniques presented in this section are relatively weak and should be used with care. +-----------------------+----------------------------+ | Scheme | Action | +-----------------------+----------------------------+ | Precision Degradation |GeneralisationGeneralization | | Enumeration | Direct or Set Substitution | | Random Shifts | Direct Substitution | +-----------------------+----------------------------+ 4.3.1. Precision Degradation Precision Degradation is ageneralisationgeneralization technique that removes the most precise components of a timestamp, accounting all events occurring in each given interval (e.g. one millisecond for millisecond level degradation) as simultaneous. This has the effect of potentially collapsing many timestamps into one. With this technique time precision is reduced, and sequencing may be lost, but the information at which time the event occurred is preserved. Theanonymisedanonymized data may not be generally useful for applications which require strict sequencing of flows. Note that flow meters with low time precision (e.g. second precision, or millisecond precision on high-capacity networks) perform the equivalent of precision degradationanonymisationanonymization by their design. Note also that degradation to a very low precision (e.g. on the order of minutes, hours, or days) is commonly used in analyses operating on time-series aggregated data, and may also be described as binning; though the time scales are longer and applicability more restricted, this is in principle the same operation. Precision degradation to infinitely low precision is equivalent to black-markeranonymisation.anonymization. Removal of timestamp information is only recommended for analysis tasks which have no need to separate flows in time, for example for counting total volumes or unique occurrences of other flow keys in an entire dataset. 4.3.2. Enumeration Enumeration is a substitution function that retains the chronological order in which events occurred while eliminating time information. Timestamps are substituted by equidistant timestamps (or numbers) starting from a randomly chosen start value. The resulting data is useful for applications requiring strict sequencing, but not for those requiring good timing information (e.g. delay- or jitter- measurement for quality-of-service (QoS) applications or service- level agreement (SLA) validation). Note that enumeration is functionally equivalent to precision degradation in any environment into which traffic can be regularly injected to serve as a clock at the precision of the frequency of the injected flows. 4.3.3. Random Shifts Random time shifts add a random offset to every timestamp within a dataset. This reversible substitution technique therefore retains duration and inter-event interval information as well as chronological order of flows.It is primarily intendedRandom time shifts are quite weak, and relatively easy todefeatreverse in the presence of external knowledge about trafficinjection fingerprinting attacks.on the measured network. 4.4. CounterAnonymisationAnonymization Counters (such as packet and octet volumes per flow) are subject to fingerprinting and injection attacks againstanonymisation,anonymization, or for user profiling as timestamps are.Counter anonymisation can help defeat these attacks, butData sets with anonymized counters are useful onlyusablefor analysis tasks for which relative or imprecise magnitudes of activity are useful. Counter information can also be completely removed, but this is only recommended for analysis tasks which have no need to evaluate the removed counter, for example for counting only unique occurrences of other flow keys. +-----------------------+----------------------------+ | Scheme | Action | +-----------------------+----------------------------+ | Precision Degradation |GeneralisationGeneralization | | Binning |GeneralisationGeneralization | | Random noise addition | Direct or Set Substitution | +-----------------------+----------------------------+ 4.4.1. Precision Degradation As with precision degradation in timestamps, precision degradation of counters removes lower-order bits of the counters, treating all the counters in a given range as having the same value. Depending on the precision reduction, this loses information about the relationships between sizes of similarly-sized flows, but keeps relative magnitude information. Precision degradation to an infinitely low precision is equivalent to black-markeranonymisation.anonymization. 4.4.2. Binning Binning can be seen as a special case of precision degradation; the operation is identical, except for in precision degradation the counter ranges are uniform, and in binning they need not be. For example,a common counter binning scheme forconsider separating unopened TCP connections from potentially opened TCP connections. Here, packet counterscouldper flow would beto bin valuesbinned into two bins, one for 1-2together,packet flows, and3-infinity together, thereby separating potentially completely-opened TCP connections from unopened ones.one for flows with 3 or more packets. Binning schemes are generally chosen to keep precisely the amount of information required in a counter for a given analysis task. Note that, also unlike precision degradation, the bin label need not be within the bin's range. Binning counters to a single bin is equivalent to black-markeranonymisation.anonymization. 4.4.3. Random Noise Addition Random noise addition adds a random amount to a counter in each flow; this is used to keep relative magnitude information and minimize the disruption to size relationship information while avoiding fingerprinting attacks againstanonymisation.anonymization. Note that there is no guarantee that random noise addition will maintain ranking order by a counter among members of a set. Random noise addition is particularly useful when the derived analysis data will not be presented in such a way as to require the lower-order bits of the counters. 4.5.AnonymisationAnonymization of Other Flow Fields Other fields, particularly port numbers and protocol numbers, can be used to partially identify the applications that generated the traffic in a a given flow trace. This information can be used in fingerprinting attacks, and may be of interest on its own (e.g., to reveal that a certain application with suspected vulnerabilities is running on a given network). These fields are generallyanonymisedanonymized using one of two techniques. +-------------+---------------------+ | Scheme | Action | +-------------+---------------------+ | Binning |GeneralisationGeneralization | | Permutation | Direct Substitution | +-------------+---------------------+ 4.5.1. Binning Binning is ageneralisationgeneralization technique mapping a set of potentially non-uniform ranges into a set of arbitrarily labeled bins. Common bin arrangements depend on the field type and the analysis application. For example, an IP protocol bin arrangement may preserve 1, 6, and 17 for ICMP, UDP, and TCP traffic, and bin all other protocols into a single bin, to mitigate the use of uncommon protocols in fingerprinting attacks. Another example arrangement may bin source and destination ports into low (0-1023) and high (1024- 65535) bins in order to tell service from ephemeral ports without identifying individual applications. Binning other flow key fields to a single bin is equivalent to black- markeranonymisation.anonymization. Removal of other flow key information is only recommended for analysis tasks which have no need to differentiate flows on the removed keys, for example for total traffic counts or unique counts of other flow keys. 4.5.2. Permutation Permutation is a direct substitution technique, replacing each value with an value selected from the set of possible range, such that eachanonymisedanonymized value represents a unique original value. This is used to preserve the count of unique values without preserving information about, or the ordering of, the values themselves. While permutation ideally guarantees that eachanonymisedanonymized value represents a unique original value, such may require significant state in the IntermediateAnonymisationAnonymization Process. Therefore, permutation may be implemented by hashing for performance reasons, with hash functions that may have relatively small collision probabilities. Such techniques are still essentially direct substitution techniques, despite the nonzero error probability. 5. Parameters for the Description ofAnonymisationAnonymization Techniques This section details the abstract parameters used to describe theanonymisationanonymization techniques examined in the previous section, on a per- parameter basis. These parameters and their export safety inform the design of the IPFIXanonymisationanonymization metadata export specified in the following section. 5.1. Stability A stableanonymisationanonymization will always map a given value in the real space to a given value in theanonymisedanonymized space, while an unstableanonymisationanonymization will change this mapping over time; a completely unstableanonymisationanonymization is essentially indistinguishable from black- markeranonymisation.anonymization. Any givenanonymisationanonymization technique may be applied with a varying range of stability. Stability is important for assessing the comparability ofanonymisedanonymized information in different data sets, or in the same data set over different time periods. In practice, ananonymisationanonymization may also be stable for every data set published by an a particular producer to a particular consumer, stable for a stated time period within a dataset or across datasets, or stable only for a single data set. If no information about stability is available, users ofanonymisedanonymized data MAY assume that the techniques used are stable across the entire dataset, but unstable across datasets. Note that stability presents a risk-utility tradeoff, as completely stableanonymisationanonymization can be used for longer-term trend analysis tasks but also presents more risk of attack given the stable mapping. Information about the stability of a mapping SHOULD be exported along with theanonymisedanonymized data. 5.2. Truncation Length Truncation and precision degradation are described by the truncation length, or the amount of data still remaining in theanonymisedanonymized field afteranonymisation.anonymization. Truncation length can generally be inferred from a given data set, and need not be specially exported or protected. For bit-level truncation, the truncated bits are generally inferable by the least significant bit set for an instance of an Information Element described by a given Template (or the most significant bit set, in the case of reverse truncation). For precision degradation, the truncation is inferable from the maximum precision given. Note that while this inference method is generally applicable, it is data- dependent: there is no guarantee that it will recover the exact truncation length used to prepare the data. In the special case of IP address export with variable (per-record) truncation, the truncation MAY be expressed by exporting the prefix length alongside the address. 5.3. Bin Map Binning is described by the specification of a bin mapping function. This function can be generally expressed in terms of an associative array that maps each point in the original space to a bin, although from an implementation standpoint most bin functions are much simpler and more efficient. Sinceknowledge ofthe bin map for a bin mapping function is in essence the bin mapping key, and can be used to partiallydeanonymisedeanonymize binned data, depending on the degree ofgeneralisation,generalization, information about the bin mapping function SHOULD NOT be exported. 5.4. Permutation Like binning, permutation is described by the specification of a permutation function. In the general case, this can be expressed in terms of an associative array that maps each point in the original space to a point in theanonymisedanonymized space. Unlike binning, each point in theanonymisedanonymized space corresponds to a single, unique point in the original space. Sinceknowledgethe parameters of the permutation functionmay, depending onare in essence key- like (indeed, for cryptographic permutation functions, they are thefunction, be used to completely deanonymise permuted data,keys themselves), information about the permutation function or its parameters SHOULD NOT be exported. 5.5. Shift Amount Shifting requires an amount to shift each value by. Since the shift amount is the only key to a shift function, and can be used todeanonymisetrivially deanonymize data protected by shifting, information about the shift amount SHOULD NOT be exported. 6.AnonymisationAnonymization Export Support in IPFIXAnonymisedAnonymized data exported via IPFIX SHOULD be annotated withanonymisationanonymization metadata, which details which fields described by which Templates areanonymised,anonymized, and provides appropriate information on theanonymisationanonymization techniques used. This metadata SHOULD be exported in Data Records described by the recommended Options Templates described in this section; these Options Templates use the additional Information Elements described in the following subsection. Note that fieldsanonymisedanonymized using the black-marker (removal) technique do not require any special metadata support: black-markeranonymisedanonymized fields SHOULD NOT be exported at all, by omitting the corresponding Information Elements from Template describing the Data Set. In the case where application requirements dictate that a black- markeranonymisedanonymized field must remain in a Template, then an Exporting Process MAY export black-markeranonymisedanonymized fields with their native length as all-zeros, but only in cases where enough contextual information exists within the record to differentiate a black-markeranonymisedanonymized field exported in this way from a real zero value. 6.1.AnonymisationAnonymization Records and theAnonymisationAnonymization Options Template TheAnonymisationAnonymization Options Template describesAnonymisationAnonymization Records, which allowanonymisationanonymization metadata to be exported inline over IPFIX or stored in an IPFIX File, by binding information aboutanonymisationanonymization techniques to Information Elements within defined Templates or Options Templates. IPFIX Exporting Processes SHOULD exportanonymisationanonymization records for any Template describing exportedanonymisedanonymized Data Records; IPFIX Collecting Processes and processes downstream from them MAY useanonymisationanonymization records to treatanonymisedanonymized data differently depending on the applied technique.AnonymisationAnonymization Records contain ancillary information bound to a Template, so many of the considerations for Templates apply toAnonymisationAnonymization Records as well. First, reliability is important: an Exporting Process SHOULD exportAnonymisationAnonymization Records after the Templates they describe have been exported, and SHOULD exportanonymisationanonymization recordsreliably. Anonymisationreliably if supported by the underlying transport (i.e., without partial reliability when using SCTP) Anonymization Records MUST be handled by Collecting Processes as scoped to the Template to which they apply within the Transport Session in which they are sent. When a Template is withdrawn via a Template Withdrawal Message or expires during a UDP transport session, the accompanyingAnonymisationAnonymization Records are withdrawn or expire as well, and do not apply to subsequent Templates with the same Template ID within the Session unless re-exported. The Stability Class within theanonymisationFlagsanonymizationFlags IE can be used to declare that a givenanonymisationanonymization technique's mapping will remain stable across multiple sessions, but this does not mean thatanonymisationanonymization technique information given in theAnonymisationAnonymization Records themselves persist across Sessions. Each new Transport Session MUST contain newAnonymisationAnonymization Records for each Template describinganonymisedanonymized Data Sets. SCTP per-stream export [I-D.ietf-ipfix-export-per-sctp-stream] may be used to ease management ofAnonymisationAnonymization Records if appropriate for the application. The fields of theAnonymisationAnonymization Options template are as follows: +-------------------------+-----------------------------------------+ | IE | Description | +-------------------------+-----------------------------------------+ | templateId [scope] | The Template ID of the Template or | | | Options Template containing the | | | Information Element described by this | | |anonymisationanonymization record. This Information | | | Element MUST be defined as a Scope | | | Field. | | informationElementId | The Information Element identifier of | | [scope] | the Information Element described by | | | thisanonymisationanonymization record. This | | | Information Element MUST be defined as | | | a Scope Field. Exporting Processes | | | MUST clear then Enterprise bit of the | | | informationElementId and Collecting | | | Processes SHOULD ignore it; information | | | about enterprise-specific Information | | | Elements is exported via the | | | privateEnterpriseNumber Information | | | Element. | | privateEnterpriseNumber | The Private Enterprise Number of the | | [scope] [optional] | enterprise-specific Information Element | | | described by thisanonymisationanonymization record. | | | This Information Element MUST be | | | defined as a Scope Field if present. A | | | privateEnterpriseNumber of 0 signifies | | | that the Information Element is | | | IANA-registered. | | informationElementIndex | The Information Element index of the | | [scope] [optional] | instance of the Information Element | | | described by thisanonymisationanonymization record | | | identified by the informationElementId | | | within the Template. Optional; need | | | only be present when describing | | | Templates that have multiple instances | | | of the same Information Element. This | | | Information Element MUST be defined as | | | a Scope Field if present. This | | | Information Element is defined in | | | Section 6.2, below. | |anonymisationFlagsanonymizationFlags | Flags describing the mapping stability | | | and specialized modifications to the | | |AnonymisationAnonymization Technique in use. SHOULD | | | be present. This Information Element | | | is defined in Section 6.2.3, below. | |anonymisationTechniqueanonymizationTechnique | The technique used toanonymiseanonymize the | | | data. MUST be present. This | | | Information Element is defined in | | | Section 6.2.2, below. | +-------------------------+-----------------------------------------+ 6.2. Recommended Information Elements forAnonymisationAnonymization Metadata 6.2.1. informationElementIndex Description: A zero-based index of an Information Element referenced by informationElementId within a Template referenced by templateId; used to disambiguate scope for templates containing multiple identical Information Elements. Abstract Data Type: unsigned16 Data Type Semantics: identifier ElementId: TBD3 Status:ProposedCurrent 6.2.2.anonymisationTechniqueanonymizationTechnique Description: A description of theanonymisationanonymization technique applied to a referenced Information Element within a referenced Template. Each technique may be applicable only to certain Information Elements and recommended only for certain Infomation Elements; these restrictions are noted in the table below. +-------+---------------------------+-----------------+-------------+ | Value | Description | Applicable to | Recommended | | | | | for | +-------+---------------------------+-----------------+-------------+ | 0 | Undefined: the Exporting | all | all | | | Process makes no | | | | | representation as to | | | | | whether the defined field | | | | | isanonymisedanonymized or not. | | | | | While the Collecting | | | | | Process MAY assume that | | | | | the field is not | | | | |anonymised,anonymized, it is not | | | | | guaranteed not to be. | | | | | This is the default | | | | |anonymisationanonymization technique. | | | | 1 | None: the values exported | all | all | | | are real. | | | | 2 | Precision | all | all | | | Degradation/Truncation: | | | | | the values exported are | | | | |anonymisedanonymized using simple | | | | | precision degradation or | | | | | truncation. The new | | | | | precision or number of | | | | | truncated bits is | | | | | implicit in the exported | | | | | data, and can be deduced | | | | | by the Collecting | | | | | Process. | | | | 3 | Binning: the values | all | all | | | exported areanonymisedanonymized | | | | | into bins. | | | | 4 | Enumeration: the values | all | timestamps | | | exported areanonymisedanonymized | | | | | by enumeration. | | | | 5 | Permutation: the values | all | identifiers | | | exported areanonymisedanonymized | | | | | by permutation. | | | | 6 | Structured Permutation: | addresses | | | | the values exported are | | | | |anonymisedanonymized by | | | | | permutation, preserving | | | | | bit-level structure as | | | | | appropriate; this | | | | | represents | | | | | prefix-preserving IP | | | | | addressanonymisationanonymization or | | | | | structured MAC address | | | | |anonymisation.anonymization. | | | | 7 | Reverse Truncation: the | addresses | | | | values exported are | | | | |anonymisedanonymized using reverse | | | | | truncation. The number | | | | | of truncated bits is | | | | | implicit in the exported | | | | | data, and can be deduced | | | | | by the Collecting | | | | | Process. | | | | 8 | Noise: the values | non-identifiers | counters | | | exported areanonymisedanonymized | | | | | by adding random noise to | | | | | each value. | | | | 9 | Offset: the values | all | timestamps | | | exported areanonymisedanonymized | | | | | by adding a single offset | | | | | to all values. | | | +-------+---------------------------+-----------------+-------------+ Abstract Data Type: unsigned16 Data Type Semantics: identifier ElementId: TBD2 Status:ProposedCurrent 6.2.3.anonymisationFlagsanonymizationFlags Description: A flag word describing specialized modifications to theanonymisationanonymization policy in effect for theanonymisationanonymization technique applied to a referenced Information Element within a referenced Template. When flags are clear (0), the normal policy (as described byanonymisationTechnique)anonymizationTechnique) applies without modification. MSB 14 13 12 11 10 9 8 7 6 5 4 3 2 1 LSB +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | Reserved |LOR|PmA| SC | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+anonymisationFlagsanonymizationFlags IE +--------+----------+-----------------------------------------------+ | bit(s) | name | description | | (LSB = | | | | 0) | | | +--------+----------+-----------------------------------------------+ | 0-1 | SC | Stability Class: see the Stability Class | | | | table below, and section Section 5.1. | | 2 | PmA | PerimeterAnonymisation:Anonymization: when set (1), | | | | source- Information Elements as described in | | | | [RFC5103] are interpreted as external | | | | addresses, and destination- Information | | | | Elements as described in [RFC5103] are | | | | interpreted as internal addresses, for the | | | | purposes of associating | | | |anonymisationTechniqueanonymizationTechnique to Information | | | | Elements only; see Section 7.2.2 for details. | | | | This bit MUST NOT be set when associated with | | | | a non-endpoint (i.e., source- or | | | | destination-) Information Element. SHOULD be | | | | consistent within a record (i.e., if a | | | | source- Information Element has this flag | | | | set, the corresponding destination- element | | | | SHOULD have this flag set, and vice-versa.) | | 3 | LOR | Low-Order Unchanged: when set (1), the | | | | low-order bits of theanonymisedanonymized Information | | | | Element contain real data. This modification | | | | is intended for theanonymisationanonymization of | | | | network-level addresses while leaving | | | | host-level addresses intact in order to | | | | preserve host level-structure, which could | | | | otherwise be used to reverseanonymisation.anonymization. | | | | MUST NOT be set when associated with a | | | | truncation-basedanonymisationTechnique.anonymizationTechnique. | | 4-15 | Reserved | Reserved for future use: SHOULD be cleared | | | | (0) by the Exporting Process and MUST be | | | | ignored by the Collecting Process. | +--------+----------+-----------------------------------------------+ The Stability Class portion of this flags word describes the stability class of theanonymisationanonymization technique applied to a referenced Information Element within a referenced Template. Stability classes refer to the stability of the parameters of theanonymisationanonymization technique, and therefore the comparability of the mapping between the real andanonymisedanonymized values over time. This determines whichanonymisedanonymized datasets may be compared with each other. Values are as follows: +-----+-----+-------------------------------------------------------+ | Bit | Bit | Description | | 1 | 0 | | +-----+-----+-------------------------------------------------------+ | 0 | 0 | Undefined: the Exporting Process makes no | | | | representation as to how stable the mapping is, or | | | | over what time period values of this field will | | | | remain comparable; while the Collecting Process MAY | | | | assume Session level stability, Session level | | | | stability is not guaranteed. Processes SHOULD assume | | | | this is the case in the absence of stability class | | | | information; this is the default stability class. | | 0 | 1 | Session: the Exporting Process will ensure that the | | | | parameters of theanonymisationanonymization technique are stable | | | | during the Transport Session. All the values of the | | | | described Information Element for each Record | | | | described by the referenced Template within the | | | | Transport Session are comparable. The Exporting | | | | Process SHOULD endeavour to ensure at least this | | | | stability class. | | 1 | 0 | Exporter-Collector Pair: the Exporting Process will | | | | ensure that the parameters of theanonymisationanonymization | | | | technique are stable across Transport Sessions over | | | | time with the given Collecting Process, but may use | | | | different parameters for different Collecting | | | | Processes. Data exported to different Collecting | | | | Processes are not comparable. | | 1 | 1 | Stable: the Exporting Process will ensure that the | | | | parameters of theanonymisationanonymization technique are stable | | | | across Transport Sessions over time, regardless of | | | | the Collecting Process to which it is sent. | +-----+-----+-------------------------------------------------------+ Abstract Data Type: unsigned16 Data Type Semantics: flags ElementId: TBD1 Status:ProposedCurrent 7. ApplyingAnonymisationAnonymization Techniques to IPFIX Export and Storage When exporting or storinganonymisedanonymized flow data using IPFIX, certain interactions between the IPFIX Protocol and theanonymisationanonymization techniques in use must be considered; these are treated in the subsections below. 7.1. Arrangement of Processes in IPFIXAnonymisation AnonymisationAnonymization Anonymization may be applied to IPFIX data at three stages within the collection infrastructure: on initial export, at a mediator, or after collection, as shown in Figure 1. Each of these locations has specific considerations and applicability. +==========================================+ | Exporting Process | +==========================================+ | | |(Anonymised(Anonymized at Original Exporter) | V | +=============================+ | | Mediator | | +=============================+ | | | | (Anonymising Mediator) | V V +==========================================+ | Collecting Process | +==========================================+ | | (Anonymising CP/File Writer) V +--------------------+ | IPFIX File Storage | +--------------------+ Figure 1: PotentialAnonymisationAnonymization LocationsAnonymisationAnonymization is generally performed before the wider dissemination or repurposing of a flow data set, e.g., adapting operational measurement data for research. Therefore, directanonymisationanonymization of flow data on initial export is only applicable in certain restricted circumstances: when the Exporting Process (EP) is "publishing" data to a Collecting Process (CP) directly, and the Exporting Process and Collecting Process are operated by different entities. Note that certain guidelines in Section 7.2.3 with respect to timestampanonymisationanonymization may not apply in this case, as the Collecting Process may be able to deduce certain timing information from the time at which each Message is received. A much more flexible arrangement is toanonymiseanonymize data within a Mediator [I-D.ietf-ipfix-mediators-framework]. Here, original data is sent to a Mediator, which performs theanonymisationanonymization function and re-exports theanonymisedanonymized data. Such a Mediator could be located at the administrative domain boundary of the initial Exporting Process operator, exportinganonymisedanonymized data to other consumers outside theorganisation.organization. In this case, the original Exporter SHOULD use TLS [RFC5246] as specified in [RFC5101] to secure the channel to the Mediator, and the Mediator should follow the guidelines in Section 7.2, to mitigate the risk of original data disclosure. When data is to be published as ananonymisedanonymized data set in an IPFIX File [RFC5655], theanonymisationanonymization may be done at the final Collecting Process before storage and dissemination, as well. In this case, the Collector should follow the guidelines in Section 7.2, especially as regards File-specific Options in Section 7.2.4 In each of these data flows, theanonymisationanonymization of records is undertaken by an IntermediateAnonymisationAnonymization Process (IAP); the data flows into and out of this IAP are shown in Figure 2 below. packets --+ +- IPFIX Messages -+ | | | V V V +==================+ +====================+ +=============+ | Metering Process | | Collecting Process | | File Reader | +==================+ +====================+ +=============+ |Non-anonymisedNon-anonymized | Records | V V V +=========================================================+ | IntermediateAnonymisationAnonymization Process (IAP) | +=========================================================+ |AnonymisedAnonymized ^AnonymisedAnonymized | | Records | Records | V | V +===================+AnonymisationAnonymization +=============+ | Exporting Process |<--- Parameters ------>| File Writer | +===================+ +=============+ | | +------------> IPFIX Messages <----------+ Figure 2: Data flows through theanonymisationanonymization processAnonymisationAnonymization parameters must also be available to the Exporting Process and/or File Writer in order to ensure header data is also appropriatelyanonymisedanonymized as in Section 7.2.3. Following each of the data flows through the IAP, we describe five basic types ofanonymisationanonymization arrangements within this framework in Figure 3. In addition to the three arrangements described in detail above,anonymisationanonymization can also be done at a collocated Metering Process (MP) and File Writer (FW) (see section 7.3.2 of [RFC5655]), or at a file manipulator, which combines a File Writer with a File Reader (FR) (see section 7.3.7 of [RFC5655]). +----+ +-----+ +----+ pkts -> | MP |->| IAP |->| EP |->anonymisationanonymization on Original Exporter +----+ +-----+ +----+ +----+ +-----+ +----+ pkts -> | MP |->| IAP |->| FW |-> Anonymising collocated MP/File Writer +----+ +-----+ +----+ +----+ +-----+ +----+ IPFIX -> | CP |->| IAP |->| EP |-> Anonymising Mediator (Masq. Proxy) +----+ +-----+ +----+ +----+ +-----+ +----+ IPFIX -> | CP |->| IAP |->| FW |-> Anonymising collocated CP/File Writer +----+ +-----+ +----+ +----+ +-----+ +----+ IPFIX -> | FR |->| IAP |->| FW |-> Anonymising file manipulator File +----+ +-----+ +----+ Figure 3: Possibleanonymisationanonymization arrangements in the IPFIX architecture Note thatanonymisationanonymization may occur at more than one location within a given collection infrastructure, to provide varying levels ofanonymisation,anonymization, disclosure risk, or data utility for specific purposes. 7.2. IPFIX-SpecificAnonymisationAnonymization Guidelines In implementing and deploying theanonymisationanonymization techniques described in this document, implementors should note that IPFIX already provides features that supportanonymisedanonymized data export, and use these where appropriate. Care must also be taken that data structures supporting the operation of the protocol itself do not leak data that could be used to reverse theanonymisationanonymization applied to the flow data. Such data structures may appear in the header, or within the data stream itself, especially as options data. Each of these and their impact on specificanonymisationanonymization techniques is noted in a separate subsection below. 7.2.1. Appropriate Use of Information Elements forAnonymisedAnonymized Data Note, as in Section 6 above, that black-markeranonymisedanonymized fields SHOULD NOT be exported at all; the absence of the field in a given Data Set is implicitly declared by not including the corresponding Information Element in the Template describing that Data Set. When using precision degradation of timestamps, Exporting Processes SHOULD export timing information using Information Elements of an appropriate precision, as explained in Section 4.5 of [RFC5153]. For example, timestamps measured in millisecond-level precision and degraded to second-level precision should use flowStartSeconds and flowEndSeconds, not flowStartMilliseconds and flowEndMilliseconds. When exportinganonymisedanonymized data andanonymisationanonymization metadata, Exporting Processes SHOULD ensure that the combination of Information Element and declaredanonymisationanonymization technique are compatible. Specifically, the applicable and recommended Information Element types and semantics for each technique are noted in the description of theanonymisationTechniqueanonymizationTechnique Information Element in Section 6.2.2. In this description, a timestamp is an Information Element with the data type dateTimeSeconds, dataTimeMilliseconds, dateTimeMicroseconds, or dateTimeNanoseconds; an address is an Information Element with the data type ipv4Address, ipv6Address, or macAddress; and an identifier is an Information Element with identifier data type semantics. Exporting Process MUST NOT exportAnonymisationAnonymization Options records binding techniques to Information Elements to which they are not applicable, and SHOULD NOT exportAnonymisationAnonymization Options records binding techniques to Information Elements for which they are not recommended. 7.2.2. Export of Perimeter-BasedAnonymisationAnonymization Policies Data collected from a single network may require differentanonymisationanonymization policies for addresses internal and external to the network. For example, internal addresses could be subject to simple permutation, while external addresses could be aggregated into networks by truncation. When exportinganonymisedanonymized perimeter bidirectional flow (biflow) data as in section 5.2 of [RFC5103], this arrangement may be easily represented by specifying one technique for source endpoint information (which represents the external endpoint in a perimeter biflow) and one technique for destination endpoint information (which represents the internal address in a perimeter biflow). However, it can also be useful to represent perimeter-basedanonymisationanonymization policies with unidirectional flow (uniflow), or non- perimeter biflow data. In this case, the PerimeterAnonymisationAnonymization bit (bit 2) in theanonymisationFlagsanonymizationFlags Information Element describing theanonymisedanonymized address Information Elements can be set to change the meaning of "source" and "destination" of Information Elements to mean "external" and "internal" as with perimeter biflows, but only with respect toanonymisationanonymization policies. 7.2.3.AnonymisationAnonymization of Header Data Each IPFIX Message contains a Message Header; within this Message Header are contained two fields which may be used to break certainanonymisationanonymization techniques: the Export Time, and the Observation Domain ID Export of IPFIX Messages containinganonymisedanonymized timestamp data where the original Export Time Message header has some relationship to theanonymisedanonymized timestamps SHOULDanonymiseanonymize the Export Time header field so that the Export Time is consistent with theanonymisedanonymized timestamp data. Otherwise, relationships between export and flow time could be used to partially or totally reverse timestampanonymisation.anonymization. When anonymising timestamps and the Export Time header field SHOULD avoid times too far in the past or future; while [RFC5101] does not make any allowance for Export Time error detection, it is sensible that Collecting Processes may interpret Messages with seemingly nonsensical Export Times as erroneous. Specific limits are implementation-dependent, but this issue may cause interoperability issues when anonymising the Export Time header field. The similarity in size between an Observation Domain ID and an IPv4 address (32 bits) may lead to a temptation to use an IPv4 interface address on the Metering or Exporting Process as the Observation Domain ID. If this address bears some relation to the IP addresses in the flow data (e.g., shares a network prefix with internal addresses) and the IP addresses in the flow data areanonymisedanonymized in a structure-preserving way, then the Observation Domain ID may be used to break the IP addressanonymisation.anonymization. Use of an IPv4 interface address on the Metering or Exporting Process as the Observation Domain ID is NOT RECOMMENDED in this case. 7.2.4.AnonymisationAnonymization of Options Data IPFIX uses the Options mechanism to export, among other things, metadata about exported flows and the flow collection infrastructure. As with the IPFIX Message Header, certain Options recommended in [RFC5101] and [RFC5655] containing flow timestamps and network addresses of Exporting and Collecting Processes may be used to break certainanonymisationanonymization techniques. When using these Options alonganonymisedanonymized data export and storage, values within the Options which could be used to break theanonymisationanonymization SHOULD themselves beanonymisedanonymized or omitted. The Exporting Process Reliability Statistics Options Template, recommended in [RFC5101], contains an Exporting Process ID field, which may be an exportingProcessIPv4Address Information Element or an exportingProcessIPv6Address Information Element. If the Exporting Process address bears some relation to the IP addresses in the flow data (e.g., shares a network prefix with internal addresses) and the IP addresses in the flow data areanonymisedanonymized in a structure- preserving way, then the Exporting Process address may be used to break the IP addressanonymisation.anonymization. Exporting Processes exportinganonymisedanonymized data in this situation SHOULD mitigate the risk of attack either by omitting Options described by the Exporting Process Reliability Statistics Options Template, or by anonymising the Exporting Process address using a similar technique to that used toanonymiseanonymize the IP addresses in the exported data. Similarly, the Export Session Details Options Template and Message Details Options Template specified for the IPFIX File Format [RFC5655] may contain the exportingProcessIPv4Address Information Element or the exportingProcessIPv6Address Information Element to identify an Exporting Process from which a flow record was received, and the collectingProcessIPv4Address Information Element or the collectingProcessIPv6Address Information Element to identify the Collecting Process which received it. If the Exporting Process or Collecting Process address bears some relation to the IP addresses in the data set (e.g., shares a network prefix with internal addresses) and the IP addresses in the data set areanonymisedanonymized in a structure- preserving way, then the Exporting Process or Collecting Process address may be used to break the IP addressanonymisation.anonymization. Since these Options Templates are primarily intended for storing IPFIX Transport Session data for auditing, replay, and testing purposes, it is NOT RECOMMENDED that storage ofanonymisedanonymized data include these Options Templates in order to mitigate the risk of attack. The Message Details Options Template specified for the IPFIX File Format [RFC5655] also contains the collectionTimeMilliseconds Information Element. As with the Export Time Message Header field, if the exported data set containsanonymisedanonymized timestamp information, and the collectionTimeMilliseconds Information Element in a given Message has some relationship to theanonymisedanonymized timestamp information, then this relationship can be exploited to reverse the timestampanonymisation.anonymization. Since this Options Template is primarily intended for storing IPFIX Transport Session data for auditing, replay, and testing purposes, it is NOT RECOMMENDED that storage ofanonymisedanonymized data include this Options Template in order to mitigate the risk of attack. Since the Time Window Options Template specified for the IPFIX File Format [RFC5655] refers to the timestamps within the data set to provide partial table of contents information for an IPFIX File, Options described by this template SHOULD be written using theanonymisedanonymized timestamps instead of the original ones. 7.2.5. Special-Use Address Space Considerations When anonymising data for transport or storage using IPFIX containinganonymisedanonymized IP addresses, and the analysis purpose permits doing so, it is RECOMMENDED to filter out or leaveunanonymisedunanonymized data containing the special-use IPv4 addresses enumerated in [RFC5735] or the special-use IPv6 addresses enumerated in [RFC5156]. Data containing these addresses (e.g. 0.0.0.0 and 169.254.0.0/16 for link-local autoconfiguration in IPv4 space) are often associated with specific, well-known behavioral patterns. Detection of these patterns inanonymisedanonymized data can lead todeanonymisationdeanonymization of these special-use addresses, which increases the chance of a complete reversal ofanonymisationanonymization by an attacker, especially of prefix-preserving techniques. 7.2.6. Protecting Out-of-Band Configuration and Management Data Special care should be taken when exporting or sharinganonymisedanonymized data to avoid information leakage via the configuration or management planes of the IPFIX Device containing the Exporting Process or the File Writer. For example, adding noise to counters is useless if the receiver can deduce the values in the counters from SNMP information, and concealing the network under test is similarly useless if such information is available in a configuration document. As the specifics of these concerns are largely implementation- and deployment-dependent, specific mitigation is out of scope for this draft. The general ground rule is that information of similar type to thatanonymisedanonymized SHOULD NOT be made available to the receiver by any means, whether in the Data Records, in IPFIX protocol structures such as Message Headers, or out-of-band. 8. Examples In this example, consider the export or storage of ananonymisedanonymized IPv4 data set from a single network described by a simple template containing a timestamp in seconds, a five-tuple, and packet and octet counters. The template describing each record in this data set is shown in figure Figure 4. 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Set ID = 2 | Length = 40 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Template ID = 256 | Field Count = 8 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| flowStartSeconds 150 | Field Length = 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| sourceIPv4Address 8 | Field Length = 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| destinationIPv4Address 12 | Field Length = 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| sourceTransportPort 7 | Field Length = 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| destinationTransportPort 11 | Field Length = 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| packetDeltaCount 2 | Field Length = 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| octetDeltaCount 1 | Field Length = 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| protocolIdentifier 4 | Field Length = 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 4: Example Flow Template Suppose that this data set isanonymisedanonymized according to the following policy: o IP addresses within the network are protected by reverse truncation. o IP addresses outside the network are protected by prefix- preservinganonymisation.anonymization. o Octet counts are exported using degraded precision in order to provide minimal protection against fingerprinting attacks. o All other fields are exportedunanonymised.unanonymized. In order to exportanonymisationanonymization records for this template and policy, first, theAnonymisationAnonymization Options Template shown in figure Figure 5 is exported. For this example, the optional privateEnterpriseNumber and informationElementIndex Information Elements are omitted, because they are not used. 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Set ID = 3 | Length = 26 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Template ID = 257 | Field Count = 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Scope Field Count = 2 |0| templateID 145 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Field Length = 2 |0| informationElementId 303 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Field Length = 2 |0|anonymisationFlagsanonymizationFlags TBD1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Field Length = 2 |0|anonymisationTechniqueanonymizationTechnique TBD2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Field Length = 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 5: ExampleAnonymisationAnonymization Options Template Following theAnonymisationAnonymization Options Template comes a Data Set containingAnonymisationAnonymization Records. This data set has an entry for each Information Element Specifier in Template 256 describing the flow records. This Data Set is shown in figure Figure 6. Note that sourceIPv4Address and destinationIPv4Address have the PerimeterAnonymisationAnonymization (0x0004) flag set inanonymisationFlags,anonymizationFlags, meaning that source address should be treated as network-external, and the destination address as network-internal. 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Set ID = 257 | Length = 68 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Template 256 | flowStartSeconds IE 150 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | no flags 0x0000 | NotAnonymisedAnonymized 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Template 256 | sourceIPv4Address IE 8 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Perimeter, Session SC 0x0005 | Structured Permutation 6 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Template 256 | destinationIPv4Address IE 12 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Perimeter, Stable 0x0007 | Reverse Truncation 7 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Template 256 | sourceTransportPort IE 7 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | no flags 0x0000 | NotAnonymisedAnonymized 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Template 256 | dest.TransportPort IE 11 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | no flags 0x0000 | NotAnonymisedAnonymized 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Template 256 | packetDeltaCount IE 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | no flags 0x0000 | NotAnonymisedAnonymized 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Template 256 | octetDeltaCount IE 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Stable 0x0003 | Precision Degradation 2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Template 256 | protocolIdentifier IE 4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | no flags 0x0000 | NotAnonymisedAnonymized 1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 6: ExampleAnonymisationAnonymization Records Following theAnonymisationAnonymization Records come the data sets containing theanonymisedanonymized data, exported according to the template in figure Figure 4. Bringing it all together, consider an IPFIX Message containing three real data records and the necessary templates to export them, shown in Figure 7. (Note that the scale of this message is 8-bytes per line, for compactness; lines of dots '. . . . . ' represent shifting of the example bit structure for clarity.) 1 2 3 4 5 6 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x000a | length 135 | export time 1271227717 | msg | sequence 0 | domain 1 | hdr | SetID 2 | length 40 | tid 256 | fields 8 | tmpl | IE 150 | length 4 | IE 8 | length 4 | set | IE 12 | length 4 | IE 7 | length 2 | | IE 11 | length 2 | IE 2 | length 4 | | IE 1 | length 4 | IE 4 | length 1 | | SetID 256 | length 79 | time 1271227681 | data | sip 192.0.2.3 | dip 198.51.100.7 | set | sp 53 | dp 53 | packets 1 | | bytes 74 | prt 17 | . . . . . . . . . . . | time 1271227682 | sip 198.51.100.7 | | dip 192.0.2.88 | sp 5091 | dp 80 | | packets 60 | bytes 2896 | | prt 6 | . . . . . . . . . . . . . . . . . . . . . . . . . . . | time 1271227683 | sip 198.51.100.7 | | dip 203.0.113.9 | sp 5092 | dp 80 | | packets 44 | bytes 2037 | | prt 6 | +---------+ Figure 7: Example Real Message The correspondinganonymisedanonymized message is then shown in Figure 8. The options template set describingAnonymisationAnonymization Records and theAnonymisationAnonymization Records themselves are added; IP addresses and byte counts areanonymisedanonymized as declared. 1 2 3 4 5 6 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x000a | length 233 | export time 1271227717 | msg | sequence 0 | domain 1 | hdr | SetID 2 | length 40 | tid 256 | fields 8 | tmpl | IE 150 | length 4 | IE 8 | length 4 | set | IE 12 | length 4 | IE 7 | length 2 | | IE 11 | length 2 | IE 2 | length 4 | | IE 1 | length 4 | IE 4 | length 1 | | SetID 3 | length 30 | tid 257 | fields 4 | opt | scope 2 | . . . . . . . . . . . . . . . . . . . . . . . . tmpl | IE 145 | length 2 | IE 303 | length 2 | set | IE TBD1 | length 2 | IE TBD2 | length 2 | | SetID 257 | length 68 | . . . . . . . . . . . . . . . . anon | tid 256 | IE 150 | flags 0 | tech 1 | recs | tid 256 | IE 8 | flags 5 | tech 6 | | tid 256 | IE 12 | flags 7 | tech 7 | | tid 256 | IE 7 | flags 0 | tech 1 | | tid 256 | IE 11 | flags 0 | tech 1 | | tid 256 | IE 2 | flags 0 | tech 1 | | tid 256 | IE 1 | flags 3 | tech 2 | | tid 256 | IE41 | flags 0 | tech 1 | | SetID 256 | length 79 | time 1271227681 | data | sip 254.202.119.209 | dip 0.0.0.7 | set | sp 53 | dp 53 | packets 1 | | bytes 100 | prt 17 | . . . . . . . . . . . | time 1271227682 | sip 0.0.0.7 | | dip 254.202.119.6 | sp 5091 | dp 80 | | packets 60 | bytes 2900 | | prt 6 | . . . . . . . . . . . . . . . . . . . . . . . . . . . | time 1271227683 | sip 0.0.0.7 | | dip 2.19.199.176 | sp 5092 | dp 80 | | packets 60 | bytes 2000 | | prt 6 | +---------+ Figure 8: CorrespondingAnonymisedAnonymized Message 9. Security Considerations This document provides guidelines for exporting metadata aboutanonymisedanonymized data in IPFIX, or storing metadata aboutanonymisedanonymized data in IPFIX Files. It is not intended as a general statement on the applicability of specific flow dataanonymisationanonymization techniques. Exporters or publishers ofanonymisedanonymized data must take care that the appliedanonymisationanonymization technique is appropriate for the data source, the purpose, and the risk ofdeanonymisationdeanonymization of a given application. Research in anonymization techniques, and techniques for deanonymization, is ongoing, and currently "safe" anonymization techniques may be rendered unsafe by future developments. We note specifically thatanonymisationanonymization is not a replacement for encryption for confidentiality. It is only appropriate for protecting identifying information in data to be used for purposes in which the protected data is irrelevant. Confidentiality in export is best served by using TLS [RFC5246] or DTLS [RFC4347] as in the Security Considerations section of [RFC5101], and in long-term storage byimplementation- specificimplementation-specific protection applied as in the Security Considerations section of [RFC5655]. Indeed, confidentiality andanonymisationanonymization are not mutually exclusive, as encryption for confidentiality may be applied toanonymisedanonymized data export or storage, as well, when theanonymisedanonymized data is not intended for public release. We note as well that care should be taken even with well-anonymized data, and anonymized data should still be treated as privacy- sensitive. Anonymization reduces the risk of misuse, but is not a complete solution to the problem of protecting end-user privacy in network flow trace analysis. When usingpseudonymisationpseudonymization techniques that have a mutable mapping, there is an inherent tradeoff in the stability of the map between long-term comparability and security of the data set againstdeanonymisation.deanonymization. In general,deanonymisationdeanonymization attacks are more effective given more information, so the longer a given mapping is valid, the more information can be applied todeanonymisation.deanonymization. The specific details of this are technique-dependent and therefore out of the scope of this document. When releasinganonymisedanonymized data, publishers need to ensure that data that could be used indeanonymisationdeanonymization is not leaked throughthe export protocol; guidelinesa side channel. The entire workflow (hardware, software, operational policies and procedures, etc.) for handling anonymized data must be evaluated for risk of data leakage. While most of these possible side channels are out of scope foraddressingthis document, guidelines for reducing the risk of information leakage specific to the IPFIX export protocol are provided in Section 7.2. Note as well that the Security Considerations section of [RFC5101] applies as well to the export ofanonymisedanonymized data, and the Security Considerations section of [RFC5655] to the storage ofanonymisedanonymized data, or the publication ofanonymisedanonymized traces. 10. IANA Considerations This document specifies the creation of several new IPFIX Information Elements in the IPFIX Information Element registry located at http://www.iana.org/assignments/ipfix, as defined in Section 6.2 above. IANA has assigned the following Information Element numbers for their respective Information Elements as specified below: o Information Element number TBD1 for theanonymisationFlagsanonymizationFlags Information Element. o Information Element number TBD2 for theanonymisationTechniqueanonymizationTechnique Information Element. o Information Element number TBD3 for the informationElementIndex Information Element. [NOTE for IANA: The text TBDn should be replaced with the respective assigned Information Element numbers where they appear in thisdocument.]document. Information Element numbers should be assigned outside the NetFlow V9 compatibility range, as these Information Elements are not supported by NetFlow V9.] 11. Acknowledgments We thank Paul Aitken and John McHugh for their comments and insight, and Carsten Schmoll, Benoit Claise, Lothar Braun,andDanRomascanuRomascanu, Stewart Bryant, and Sean Turner for their reviews. Special thanks to theICT-PRISM projectFP7 PRISM and DEMONS projects foritstheir material support of this work. 12. References 12.1. Normative References [RFC5101] Claise, B., "Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of IP Traffic Flow Information", RFC 5101, January 2008. [RFC5102] Quittek, J., Bryant, S., Claise, B., Aitken, P., and J. Meyer, "Information Model for IP Flow Information Export", RFC 5102, January 2008. [RFC5103] Trammell, B. and E. Boschi, "Bidirectional Flow Export Using IP Flow Information Export (IPFIX)", RFC 5103, January 2008. [RFC5655] Trammell, B., Boschi, E., Mark, L., Zseby, T., and A. Wagner, "Specification of the IP Flow Information Export (IPFIX) File Format", RFC 5655, October 2009. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC5735] Cotton, M. and L. Vegoda, "Special Use IPv4 Addresses", BCP 153, RFC 5735, January 2010. [RFC5156] Blanchet, M., "Special-Use IPv6 Addresses", RFC 5156, April 2008. 12.2. Informative References [RFC5470] Sadasivan, G., Brownlee, N., Claise, B., and J. Quittek, "Architecture for IP Flow Information Export", RFC 5470, March 2009. [RFC5472] Zseby, T., Boschi, E., Brownlee, N., and B. Claise, "IP Flow Information Export (IPFIX) Applicability", RFC 5472, March 2009. [I-D.ietf-ipfix-mediators-framework] Kobayashi, A., Claise, B., Muenz, G., and K. Ishibashi, "IPFIX Mediation: Framework",draft-ietf-ipfix-mediators-framework-08draft-ietf-ipfix-mediators-framework-09 (work in progress),AugustOctober 2010. [I-D.ietf-ipfix-export-per-sctp-stream] Claise, B., Aitken, P., Johnson, A., and G. Muenz, "IPFIX Export per SCTP Stream", draft-ietf-ipfix-export-per-sctp-stream-08 (work in progress), May 2010. [RFC5153] Boschi, E., Mark, L., Quittek, J., Stiemerling, M., and P. Aitken, "IP Flow Information Export (IPFIX) Implementation Guidelines", RFC 5153, April 2008. [RFC3917] Quittek, J., Zseby, T., Claise, B., and S. Zander, "Requirements for IP Flow Information Export (IPFIX)", RFC 3917, October 2004. [RFC4291] Hinden, R. and S. Deering, "IP Version 6 Addressing Architecture", RFC 4291, February 2006. [RFC4347] Rescorla, E. and N. Modadugu, "Datagram Transport Layer Security", RFC 4347, April 2006. [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security (TLS) Protocol Version 1.2", RFC 5246, August 2008. [Bur10] Burkhart, M., Schatzmann, D., Trammell, B., and E. Boschi, "The Role of Network Trace Anonymization Under Attack", ACM Computer Communications Review, vol. 40, no. 1, pp. 6-11, January 2010. [Mur07] Murdoch, S. and P. Zielinski, "Sampled Traffic Analysis by Internet-Exchange-Level Adversaries", Proceedings of the 7th Workshop on Privacy Enhancing Technologies, Ottawa, Canada., June 2007. Authors' Addresses Elisa Boschi Swiss Federal Institute of Technology Zurich Gloriastrasse 35 8092 Zurich Switzerland Email: boschie@tik.ee.ethz.ch Brian Trammell Swiss Federal Institute of Technology Zurich Gloriastrasse 35 8092 Zurich Switzerland Phone: +41 44 632 70 13 Email: trammell@tik.ee.ethz.ch