idnits 2.17.1 draft-boschi-ipfix-anon-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.ii or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 4 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 10, 2009) is 5398 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Missing Reference: 'CITE' is mentioned on line 524, but not defined == Unused Reference: 'I-D.ietf-ipfix-mediators-problem-statement' is defined on line 1288, but no explicit reference was found in the text ** Obsolete normative reference: RFC 5101 (Obsoleted by RFC 7011) ** Obsolete normative reference: RFC 5102 (Obsoleted by RFC 7012) == Outdated reference: A later version (-05) exists of draft-ietf-ipfix-file-04 == Outdated reference: A later version (-09) exists of draft-ietf-ipfix-mediators-framework-02 == Outdated reference: A later version (-09) exists of draft-ietf-ipfix-mediators-problem-statement-03 Summary: 4 errors (**), 0 flaws (~~), 7 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IPFIX Working Group E. Boschi 3 Internet-Draft B. Trammell 4 Intended status: Experimental Hitachi Europe 5 Expires: January 11, 2010 July 10, 2009 7 IP Flow Anonymisation Support 8 draft-boschi-ipfix-anon-04.txt 10 Status of this Memo 12 This Internet-Draft is submitted to IETF in full conformance with the 13 provisions of BCP 78 and BCP 79. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at any 22 time. It is inappropriate to use Internet-Drafts as reference 23 material or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt. 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 This Internet-Draft will expire on January 11, 2010. 33 Copyright Notice 35 Copyright (c) 2009 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents in effect on the date of 40 publication of this document (http://trustee.ietf.org/license-info). 41 Please review these documents carefully, as they describe your rights 42 and restrictions with respect to this document. 44 Abstract 46 This document describes anonymisation techniques for IP flow data and 47 the export of anonymised data using the IPFIX protocol. It provides 48 a categorization of common anonymisation schemes and defines the 49 parameters needed to describe them. It provides guidelines for the 50 implementation of anonymised data export and storage over IPFIX, and 51 describes an Options-based method for anonymization metadata export 52 within the IPFIX protocol, providing the basis for the definition of 53 information models for configuring anonymisation techniques within an 54 IPFIX Metering or Exporting Process, and for reporting the technique 55 in use to an IPFIX Collecting Process. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 60 1.1. IPFIX Protocol Overview . . . . . . . . . . . . . . . . . 4 61 1.2. IPFIX Documents Overview . . . . . . . . . . . . . . . . . 5 62 1.3. Anonymisation within the IPFIX Architecture . . . . . . . 5 63 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 7 64 3. Categorisation of Anonymisation Techniques . . . . . . . . . . 7 65 4. Anonymisation of IP Flow Data . . . . . . . . . . . . . . . . 8 66 4.1. IP Address Anonymisation . . . . . . . . . . . . . . . . . 10 67 4.1.1. Truncation . . . . . . . . . . . . . . . . . . . . . . 10 68 4.1.2. Random Permutation . . . . . . . . . . . . . . . . . . 10 69 4.1.3. Prefix-preserving Pseudonymisation . . . . . . . . . . 11 70 4.2. Hardware Address Anonymisation . . . . . . . . . . . . . . 11 71 4.2.1. Random Permutation . . . . . . . . . . . . . . . . . . 12 72 4.2.2. Structured Pseudonymisation . . . . . . . . . . . . . 12 73 4.3. Timestamp Anonymisation . . . . . . . . . . . . . . . . . 12 74 4.3.1. Precision Degradation . . . . . . . . . . . . . . . . 13 75 4.3.2. Enumeration . . . . . . . . . . . . . . . . . . . . . 13 76 4.3.3. Random Time Shifts . . . . . . . . . . . . . . . . . . 13 77 4.4. Counter Anonymisation . . . . . . . . . . . . . . . . . . 14 78 4.4.1. Precision Degradation . . . . . . . . . . . . . . . . 14 79 4.4.2. Binning . . . . . . . . . . . . . . . . . . . . . . . 14 80 4.4.3. Random Noise Addition . . . . . . . . . . . . . . . . 15 81 4.5. Anonymisation of Other Flow Fields . . . . . . . . . . . . 15 82 4.5.1. Binning . . . . . . . . . . . . . . . . . . . . . . . 15 83 4.5.2. Random Permutation . . . . . . . . . . . . . . . . . . 16 84 5. Parameters for the Description of Anonymisation Techniques . . 16 85 5.1. Stability . . . . . . . . . . . . . . . . . . . . . . . . 16 86 5.2. Truncation Length . . . . . . . . . . . . . . . . . . . . 16 87 5.3. Bin Map . . . . . . . . . . . . . . . . . . . . . . . . . 17 88 5.4. Permutation . . . . . . . . . . . . . . . . . . . . . . . 17 89 5.5. Shift Amount . . . . . . . . . . . . . . . . . . . . . . . 17 90 6. Anonymisation Export Support in IPFIX . . . . . . . . . . . . 17 91 6.1. Anonymisation Options Template . . . . . . . . . . . . . . 18 92 6.2. Recommended Information Elements for Anonymisation 93 Metadata . . . . . . . . . . . . . . . . . . . . . . . . . 19 94 6.2.1. anonymisationStability . . . . . . . . . . . . . . . . 19 95 6.2.2. anonymisationTechnique . . . . . . . . . . . . . . . . 20 96 6.2.3. informationElementIndex . . . . . . . . . . . . . . . 22 97 7. Applying Anonymisation Techniques to IPFIX Export and 98 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 99 7.1. Arrangement of Processes in IPFIX Anonymisation . . . . . 22 100 7.2. IPFIX-Specific Anonymisation Guidelines . . . . . . . . . 25 101 7.2.1. Appropriate Use of Information Elements for 102 Anonymised Data . . . . . . . . . . . . . . . . . . . 25 103 7.2.2. Anonymisation of Header Data . . . . . . . . . . . . . 26 104 7.2.3. Anonymisation of Options Data . . . . . . . . . . . . 27 105 8. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 106 9. Security Considerations . . . . . . . . . . . . . . . . . . . 28 107 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 28 108 11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 29 109 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 29 110 12.1. Normative References . . . . . . . . . . . . . . . . . . . 29 111 12.2. Informative References . . . . . . . . . . . . . . . . . . 29 112 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 30 114 1. Introduction 116 The standardisation of an IP flow information export protocol 117 [RFC5101] and associated representations removes a technical barrier 118 to the sharing of IP flow data across organizational boundaries and 119 with network operations, security, and research communities for a 120 wide variety of purposes. However, with wider dissemination comes 121 greater risks to the privacy of the users of networks under 122 measurement, and to the security of those networks. While it is not 123 a complete solution to the issues posed by distribution of IP flow 124 information, anonymisation (i.e., the deletion or transformation of 125 information that is considered sensitive and could be used to reveal 126 the identity of subjects involved in a communication) is an important 127 tool for the protection of privacy within network measurement 128 infrastructures. 130 This document presents a mechanism for representing anonymised data 131 within IPFIX and guidelines for using it. It begins with a 132 categorization of anonymisation techniques. It then describes 133 applicability of each technique to commonly anonymisable fields of IP 134 flow data, organized by information element data type and semantics 135 as in [RFC5102]; enumerates the parameters required by each of the 136 applicable anonymisation techniques; and provides guidelines for the 137 use of each of these techniques in accordance with best practices in 138 data protection. Finally, it specifies a mechanism for exporting 139 anonymised data and binding anonymisation metadata to templates using 140 IPFIX Options. 142 1.1. IPFIX Protocol Overview 144 In the IPFIX protocol, { type, length, value } tuples are expressed 145 in templates containing { type, length } pairs, specifying which { 146 value } fields are present in data records conforming to the 147 Template, giving great flexibility as to what data is transmitted. 148 Since Templates are sent very infrequently compared with Data 149 Records, this results in significant bandwidth savings. Various 150 different data formats may be transmitted simply by sending new 151 Templates specifying the { type, length } pairs for the new data 152 format. See [RFC5101] for more information. 154 The IPFIX information model [RFC5102] defines a large number of 155 standard Information Elements which provide the necessary { type } 156 information for Templates. The use of standard elements enables 157 interoperability among different vendors' implementations. 158 Additionally, non-standard enterprise-specific elements may be 159 defined for private use. 161 1.2. IPFIX Documents Overview 163 "Specification of the IPFIX Protocol for the Exchange of IP Traffic 164 Flow Information" [RFC5101] and its associated documents define the 165 IPFIX Protocol, which provides network engineers and administrators 166 with access to IP traffic flow information. 168 "Architecture for IP Flow Information Export" [RFC5470] defines the 169 architecture for the export of measured IP flow information out of an 170 IPFIX Exporting Process to an IPFIX Collecting Process, and the basic 171 terminology used to describe the elements of this architecture, per 172 the requirements defined in "Requirements for IP Flow Information 173 Export" [RFC3917]. The IPFIX Protocol document [RFC5101] then covers 174 the details of the method for transporting IPFIX Data Records and 175 Templates via a congestion-aware transport protocol from an IPFIX 176 Exporting Process to an IPFIX Collecting Process. 178 "Information Model for IP Flow Information Export" [RFC5102] 179 describes the Information Elements used by IPFIX, including details 180 on Information Element naming, numbering, and data type encoding. 181 Finally, "IPFIX Applicability" [RFC5472] describes the various 182 applications of the IPFIX protocol and their use of information 183 exported via IPFIX, and relates the IPFIX architecture to other 184 measurement architectures and frameworks. 186 Additionally, the "Specification of the IPFIX File Format" 187 [I-D.ietf-ipfix-file] describes a file format based upon the IPFIX 188 Protocol for the storage of flow data. 190 This document references the Protocol and Architecture documents for 191 terminology, and extends the IPFIX Information Model to provide new 192 Information Elements for anonymisation metadata. The anonymisation 193 techniques described herein are equally applicable to the IPFIX 194 Protocol and data stored in IPFIX Files. 196 1.3. Anonymisation within the IPFIX Architecture 198 "Architecture for IP Flow Information Export" [RFC5470] defines the 199 functions performed in sequence by the various functional blocks in 200 an IPFIX Device as in the figure below. 202 Packet(s) coming into Observation Point(s) 203 | | 204 v v 205 +----------------+-------------------------+ +-----+-------+ 206 | Metering Process on an | | | 207 | Observation Point | | | 208 | | | | 209 | packet header capturing | | | 210 | | |...| Metering | 211 | timestamping | | Process N | 212 | | | | | 213 | +----->+ | | | 214 | | | | | | 215 | | sampling Si (1:1 in case of no | | | 216 | | | sampling) | | | 217 | | filtering Fi (select all when | | | 218 | | | no criteria) | | | 219 | +------+ | | | 220 | | | | | 221 | | Timing out Flows | | | 222 | | Handle resource overloads | | | 223 +--------|---------------------------------+ +-----|-------+ 224 | | 225 Flow Records (identified by Observation Domain) Flow Records 226 | | 227 +---------+---------------------------------+ 228 | 229 +--------------------|----------------------------------------------+ 230 | | Exporting Process | 231 |+-------------------|-------------------------------------------+ | 232 || v IPFIX Protocol | | 233 ||+-----------------------------+ +----------------------------+| | 234 |||Rules for | |Functions || | 235 ||| Picking/sending Templates | |-Packetise selected Control || | 236 ||| Picking/sending Flow Records|->| & data Information into || | 237 ||| Encoding Template & data | | IPFIX export packets. || | 238 ||| Selecting Flows to export(*)| |-Handle export errors || | 239 ||+-----------------------------+ +----------------------------+| | 240 |+----------------------------+----------------------------------+ | 241 | | | 242 | exported IPFIX Messages | 243 | | | 244 | +------------+-----------------+ | 245 | | Anonymise export packet(*) | | 246 | +------------+-----------------+ | 247 | | | 248 | +------------+-----------------+ | 249 | | Transport Protocol | | 250 | +------------+-----------------+ | 251 | | | 252 +-----------------------------+-------------------------------------+ 253 | 254 v 255 IPFIX export packet to Collector 257 (*) indicates that the block is optional. 259 Figure 1: IPFIX Device functional blocks 261 Note that, according to the original architecture specification, 262 IPFIX Message anonymisation is optionally performed as the final 263 operation before handing the Message to the transport protocol for 264 export. While no provision is made in the architecture for 265 anonymisation metadata as in Section 6, this arrangement does allow 266 for the message rewriting necessary for comprehensive anonymisation 267 of IPFIX export as in Section 7. The development of the IPFIX 268 Mediation [I-D.ietf-ipfix-mediators-framework] framework and the 269 IPFIX File Format [I-D.ietf-ipfix-file] expand upon this initial 270 architectural allowance for anonymisation by adding to the list of 271 places that anonymisation may be applied. The former specifies IPFIX 272 Mediators, which rewrite existing IPFIX messages, and the latter 273 specifies a method for storage of IPFIX data in files. 275 More detail on the applicable architectural arrangements of 276 anonymisation can be found in Section 7.1 278 2. Terminology 280 Terms used in this document that are defined in the Terminology 281 section of the IPFIX Protocol [RFC5101] document are to be 282 interpreted as defined there. 284 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 285 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 286 document are to be interpreted as described in RFC 2119 [RFC2119]. 288 3. Categorisation of Anonymisation Techniques 290 Anonymisation modifies a data set in order to protect the identity of 291 the people or entities described by the data set from disclosure. 292 With respect to network traffic data, anonymisation generally 293 attempts to preserve some set of properties of the network traffic 294 useful for a given application or applications, while ensuring the 295 data cannot be traced back to the specific networks, hosts, or users 296 generating the traffic. 298 Anonymisation may be broadly classified according to two properties: 299 recoverability and countability. All anonymisation techniques map 300 the real space of identifiers or values into a separate, anonymised 301 space, according to some function. A technique is said to be 302 recoverable when the function used is invertible or can otherwise be 303 reversed and a real identifier can be recovered from a given 304 replacement identifier. 306 Countability compares the dimension of the anonymised space (N) to 307 the dimension of the real space (M), and denotes how the count of 308 unique values is preserved by the anonymisation function. If the 309 anonymised space is smaller than the real space, then the function is 310 said to generalise the input, mapping more than one input point to 311 each anonymous value (e.g., as with aggregation). By definition, 312 generalisation is not recoverable. 314 If the dimensions of the anonymised and real spaces are the same, 315 such that the count of unique values is preserved, then the function 316 is said to be a direct substitution function. If the dimension of 317 the anonymised space is larger, such that each real value maps to a 318 set of anonymised values, then the function is said to be a set 319 substitution function. Note that with set substitution functions, 320 the sets of anonymised values are not necessarily disjoint. Either 321 direct or set substitution functions are said to be one-way if there 322 exists no method for recovering the real data point from an 323 anonymised one. 325 This classification is summarised in the table below. 327 +------------------------+-----------------+------------------------+ 328 | Recoverability / | Recoverable | Non-recoverable | 329 | Countability | | | 330 +------------------------+-----------------+------------------------+ 331 | N < M | N.A. | Generalisation | 332 | N = M | Direct | One-way Direct | 333 | | Substitution | Substitution | 334 | N > M | Set | One-way Set | 335 | | Substitution | Substitution | 336 +------------------------+-----------------+------------------------+ 338 4. Anonymisation of IP Flow Data 340 Due to the restricted semantics of IP flow data, there are a 341 relatively limited set of specific anonymisation techniques available 342 on flow data, though each falls into the broad categories above. 343 Each type of field that may commonly appear in a flow record may have 344 its own applicable specific techniques. 346 While anonymisation is generally applied at the resolution of single 347 fields within a flow record, attacks against anonymisation use entire 348 flows and relationships between hosts and flows within a given data 349 set. Therefore, fields which may not necessarily be identifying by 350 themselves may be anonymised in order to increase the anonymity of 351 the data set as a whole. 353 Of all the fields in an IP flow record, only IP addresses directly 354 identify entities in the real world. Each IP address is associated 355 with an interface on a network host, and can potentially be 356 identified with a single user. Additionally, IP addresses are 357 structured identifiers; that is, partial IP address prefixes may be 358 used to identify networks just as full IP addresses identify hosts. 359 This makes anonymisation of IP addresses particularly important. 361 Hardware addresses uniquely identify devices on the network; while 362 they are not often available in traffic data collected at Layer 3, 363 and cannot be used to locate devices within the network, some traces 364 may contain sub-IP data including hardware address data. Hardware 365 addresses may be mappable to device serial numbers, and to the 366 entities or individuals who purchased the devices, when combined with 367 external databases. They may also leak via IPv6 addresses in certain 368 circumstances. Therefore, hardware address anonymisation is also 369 important. 371 Port numbers identify abstract entities (applications) as opposed to 372 real-world entities, but they can be used to classify hosts and user 373 behavior. Passive port fingerprinting, both of well-known and 374 ephemeral ports, can be used to determine the operating system 375 running on a host. Relative data volumes by port can also be used to 376 determine the host's function (workstation, web server, etc.); this 377 information can be used to identify hosts and users. 379 While not identifiers in and of themselves, timestamps and counters 380 can reveal the behavior of the hosts and users on a network. Any 381 given network activity is recognizable by a pattern of relative time 382 differences and data volumes in the associated sequence of flows, 383 even without host address information. They can therefore be used to 384 identify hosts and users. Timestamps and counters are also 385 vulnerable to traffic injection attacks, where traffic with a known 386 pattern is injected into a network under measurement, and this 387 pattern is later identified in the anonymised data set. 389 The simplest and most extreme form of anonymisation, which can be 390 applied to any field of a flow record, is black-marker anonymisation, 391 or complete deletion of a given field. Note that black-marker 392 anonymisation is equivalent to simply not exporting the field(s) in 393 question. 395 While black-marker anonymisation completely protects the data in the 396 deleted fields from the risk of disclosure, it also reduces the 397 utility of the anonymised data set as a whole. Techniques that 398 retain some information while reducing (though not eliminating) the 399 disclosure risk will be extensively discussed in the following 400 sections; note that the techniques specifically applicable to IP 401 addresses, timestamps, ports, and counters will be discussed in 402 separate sections. 404 4.1. IP Address Anonymisation 406 Since IP addresses are the most common identifiers within flow data 407 that can be used to directly identify a person, organization, or 408 host, most of the work on flow and trace data anonymisation has gone 409 into IP address anonymisation techniques. Indeed, the aim of most 410 attacks against anonymisation is to recover the map from anonymised 411 IP addresses to original IP addresses thereby identifying the 412 identified hosts. There is therefore a wide range of IP address 413 anonymisation schemes that fit into the following categories. 415 +------------------------------------+---------------------+ 416 | Scheme | Action | 417 +------------------------------------+---------------------+ 418 | Truncation | Generalisation | 419 | Random Permutation | Direct Substitution | 420 | Prefix-preserving Pseudonymisation | Direct Substitution | 421 +------------------------------------+---------------------+ 423 4.1.1. Truncation 425 Truncation removes "n" of the least significant bits from an IP 426 address, replacing them with zeroes. In effect, it replaces a host 427 address with a network address for some fixed netblock; for IPv4 428 addresses, 8-bit truncation corresponds to replacement with a /24 429 network address. Truncation is a non-reversible generalisation 430 scheme. Note that while truncation is effective for making hosts 431 non-identifiable, it preserves information which can be used to 432 identify an organization, a geographic region, a country, or a 433 continent (or RIR region of responsibility). 435 Truncation to an address length of 0 is equivalent to black-marker 436 anonymisation. Removal of IP address information is only recommended 437 for analysis tasks which have no need to separate flow data by host 438 or network; e.g. as a first stage to per-application (port) or time- 439 series total volume analyses. 441 4.1.2. Random Permutation 443 Random permutation is a direct substitution technique, replacing each 444 IP address with an address randomly selected from the set of possible 445 IP addresses, guaranteeing that each anonymised address represents a 446 unique original address. The random permutation does not preserve 447 any structural information about a network, but it does preserve the 448 unique count of IP addresses. Any application that requires more 449 structure than host-uniqueness will not be able to use randomly 450 permuted IP addresses. 452 4.1.3. Prefix-preserving Pseudonymisation 454 Prefix-preserving pseudonymisation is a direct substitution 455 technique, further restricted such that the structure of subnets is 456 preserved at each level while anonymising IP addresses. If two real 457 IP addresses match on a prefix of "n" bits, the two anonymised IP 458 addresses will match on a prefix of "n" bits as well. This is useful 459 when relationships among networks must be preserved for a given 460 analysis task, but introduces structure into the anonymised data 461 which can be exploited in attacks against the anonymisation 462 technique. 464 4.2. Hardware Address Anonymisation 466 Flow data containing sub-IP information can also contain identifying 467 information in the form of the hardware (MAC) address. While 468 hardware address information cannot be used to locate a node within a 469 network, it can be used to directly uniquely identify a specific 470 device. Vendors or organizations within the supply chain may then 471 have the information necessary to identify the entity or individual 472 that purchased the device. 474 Hardware address information is not as structured as IP address 475 information. EUI-48 and EUI-64 hardware addresses contain an 476 Organizational Unique Identifier in the three most significant bytes 477 of the address; this OUI additionally contains bits noting whether 478 the address is locally or globally administered. Beyond this, the 479 address is unstructured, and there is no particular relationship 480 among the OUIs assigned to a given vendor. 482 Note that hardware address information also appear within IPv6 483 addresses, as the EAP-64 address, or EAP-48 address encoded as an 484 EAP-64 address, is used as the least significant 64 bits of the IPv6 485 address in the case of link local addressing or stateless 486 autoconfiguration; the considerations and techniques in this section 487 may then apply to such IPv6 addresses as well. 489 +-----------------------------+---------------------+ 490 | Scheme | Action | 491 +-----------------------------+---------------------+ 492 | Random Permutation | Direct Substitution | 493 | Structured Pseudonymisation | Direct Substitution | 494 +-----------------------------+---------------------+ 496 4.2.1. Random Permutation 498 Random permutation is a direct substitution technique, replacing each 499 IP address with an address randomly selected from the set of possible 500 IP addresses, guaranteeing that each anonymised address represents a 501 unique original address. The random permutation does not preserve 502 any structural information about a network, but it does preserve the 503 unique count of IP addresses. Any application that requires more 504 structure than host-uniqueness will not be able to use randomly 505 permuted IP addresses. 507 4.2.2. Structured Pseudonymisation 509 Structured pseudonymisation for MAC addresses is a direct 510 substitution technique, like random permutation, but restricted such 511 that the OUI (the most significant three bytes) is permuted 512 separately from the node identifier, the remainder. This is useful 513 when the uniqueness of OUIs must be preserved for a given analysis 514 task, but introduces structure into the anonymised data which can be 515 exploited in attacks against the anonymisation technique. 517 4.3. Timestamp Anonymisation 519 The particular time at which a flow began or ended is not 520 particularly identifiable information, but it can be used as part of 521 attacks against other anonymisation techniques or for user profiling. 522 Presice timestamps can be used in injected-traffic fingerprinting 523 attacks [CITE] as well as to identify certain activity by response 524 delay and size fingerprinting [CITE]. Therefore, timestamp 525 information may be anonymised in order to ensure the protection of 526 the entire dataset. 528 +-----------------------+----------------------------+ 529 | Scheme | Action | 530 +-----------------------+----------------------------+ 531 | Precision Degradation | Generalisation | 532 | Enumeration | Direct or Set Substitution | 533 | Random Shifts | Direct Substitution | 534 +-----------------------+----------------------------+ 536 4.3.1. Precision Degradation 538 Precision Degradation is a generalisation technique that removes the 539 most precise components of a timestamp, accounting all events 540 occurring in each given interval (e.g. one millisecond for 541 millisecond level degradation) as simultaneous. This has the effect 542 of potentially collapsing many timestamps into one. With this 543 technique time precision is reduced, and sequencing may be lost, but 544 the information at which time the event occurred is preserved. The 545 anonymised data may not be generally useful for applications which 546 require strict sequencing of flows. 548 Note that flow meters with low time precision (e.g. second precision, 549 or millisecond precision on high-capacity networks) perform the 550 equivalent of precision degradation anonymisation by their design. 552 Note also that degradation to a very low precision (e.g. on the order 553 of minutes, hours, or days) is commonly used in analyses operating on 554 time-series aggregated data, and may also be described as binning; 555 though the time scales are longer and applicability more restricted, 556 this is in principle the same operation. 558 Precision degradation to infinitely low precision is equivalent to 559 black-marker anonymisation. Removal of timestamp information is only 560 recommended for analysis tasks which have no need to separate flows 561 in time, for example for counting total volumes or unique occurrences 562 of other flow keys in an entire dataset. 564 4.3.2. Enumeration 566 Enumeration is a substitution function that retains the chronological 567 order in which events occurred while eliminating time information. 568 Timestamps are substituted by equidistant timestamps (or numbers) 569 starting from a randomly chosen start value. The resulting data is 570 useful for applications requiring strict sequencing, but not for 571 those requiring good timing information (e.g. delay- or jitter- 572 measurement for QoS applications or SLA validation). 574 4.3.3. Random Time Shifts 576 Random time shifts add a random offset to every timestamp within a 577 dataset. This reversible substitution technique therefore retains 578 duration and inter-event interval information as well as 579 chronological order of flows. It is primarily intended to defeat 580 traffic injection fingerprinting attacks. 582 4.4. Counter Anonymisation 584 Counters (such as packet and octet volumes per flow) are subject to 585 fingerprinting and injection attacks against anonymisation, or for 586 user profiling as timestamps are. Counter anonymisation can help 587 defeat these attacks, but are only usable for analysis tasks for 588 which relative or imprecise magnitudes of activity are useful. 590 +-----------------------+----------------------------+ 591 | Scheme | Action | 592 +-----------------------+----------------------------+ 593 | Precision Degradation | Generalisation | 594 | Binning | Generalisation | 595 | Random noise addition | Direct or Set Substitution | 596 +-----------------------+----------------------------+ 598 4.4.1. Precision Degradation 600 As with precision degradation in timestamps, precision degradation of 601 counters removes lower-order bits of the counters, treating all the 602 counters in a given range as having the same value. Depending on the 603 precision reduction, this loses information about the relationships 604 between sizes of similarly-sized flows, but keeps relative magnitude 605 information. 607 4.4.2. Binning 609 Binning can be seen as a special case of precision degradation; the 610 operation is identical, except for in precision degradation the 611 counter ranges are uniform, and in binning they need not be. For 612 example, a common counter binning scheme for packet counters could be 613 to bin values 1-2 together, and 3-infinity together, thereby 614 separating potentially completely-opened TCP connections from 615 unopened ones. Binning schemes are generally chosen to keep 616 precisely the amount of information required in a counter for a given 617 analysis task. Note that, also unlike precision degradation, the bin 618 label need not be within the bin's range. 620 Binning counters to a single bin 0-infinity, or alternately precision 621 degradation to infinitely low precision, is equivalent to black- 622 marker anonymisation. Removal of counter information is only 623 recommended for analysis tasks which have no need to evaluate the 624 removed counter, for example for counting only unique occurrences of 625 other flow keys. 627 4.4.3. Random Noise Addition 629 Random noise addition adds a random amount to a counter in each flow; 630 this is used to keep relative magnitude information and minimize the 631 disruption to size relationship information while avoiding 632 fingerprinting attacks against anonymisation. Note that there is no 633 guarantee that random noise addition will maintain ranking order by a 634 counter among members of a set. Random noise addition is 635 particularly useful when the derived analysis data will not be 636 presented in such a way as to require the lower-order bits of the 637 counters. 639 4.5. Anonymisation of Other Flow Fields 641 Other fields, particularly port numbers and protocol numbers, can be 642 used to partially identify the applications that generated the 643 traffic in a a given flow trace. This information can be used in 644 fingerprinting attacks, and may be of interest on its own (e.g., to 645 reveal that a certain application with suspected vulnerabilities is 646 running on a given network). These fields are generally anonymised 647 using one of two techniques. 649 +--------------------+---------------------+ 650 | Scheme | Action | 651 +--------------------+---------------------+ 652 | Binning | Generalisation | 653 | Random Permutation | Direct Substitution | 654 +--------------------+---------------------+ 656 4.5.1. Binning 658 Binning is a generalisation technique mapping a set of potentially 659 non-uniform ranges into a set of arbitrarily labeled bins. Common 660 bin arrangements depend on the field type and the analysis 661 application. For example, an IP protocol bin arrangement may 662 preserve 1, 6, and 17 for ICMP, UDP, and TCP traffic, and bin all 663 other protocols into a single bin, to mitigate the use of uncommon 664 protocols in fingerprinting attacks. Another example arrangement may 665 bin source and destination ports into low (0-1023) and high (1024- 666 65535) bins in order to tell service from ephemeral ports without 667 identifying individual applications. 669 Binning other flow key fields to a single bin is equivalent to black- 670 marker anonymisation. Removal of other flow key information is only 671 recommended for analysis tasks which have no need to differentiate 672 flows on the removed keys, for example for total traffic counts or 673 unique counts of other flow keys. 675 4.5.2. Random Permutation 677 Random permutation is a direct substitution technique, replacing each 678 value with an value randomly selected from the set of possible range, 679 guaranteeing that each anonymised value represents a unique original 680 value. This is used to preserve the count of unique values without 681 preserving information about, or the ordering of, the values 682 themselves. 684 5. Parameters for the Description of Anonymisation Techniques 686 This section details the abstract parameters used to describe the 687 anonymisation techniques examined in the previous section, on a per- 688 parameter basis. These parameters and their export safety inform the 689 design of the IPFIX anonymisation metadata export specified in the 690 following section. 692 5.1. Stability 694 Any given anonymisation technique may be applied with a varying range 695 of stability. Stability is important for assessing the comparability 696 of anonymised information in different data sets, or in the same data 697 set over different time periods. In general, stability ranges from 698 completely stable to completely unstable; however, note that the 699 completely unstable case is indistinguishable from black-marker 700 anonymisation. A completely stable anonymisation will always map a 701 given value in the real space to the same value in the anonymised 702 space. In practice, an anonymisation may also be stable for every 703 data set published by an a particular producer to a particular 704 consumer, stable for a stated time period within a dataset or across 705 datasets, or stable only for a single data set. 707 If no information about stability is available, users of anonymised 708 data may assume that the techniques used are stable across the entire 709 dataset, but unstable across datasets. Note that stability presents 710 a risk-utility tradeoff, as completely stable anonymisation can be 711 used for longer-term trend analysis tasks but also presents more risk 712 of attack given the stable mapping. 714 5.2. Truncation Length 716 Truncation and precision degradation are described by the truncation 717 length, or the amount of data still remaining in the anonymised field 718 after anonymisation. 720 Truncation length can be inferred from a given data set, and need not 721 be specially exported or protected. 723 5.3. Bin Map 725 Binning is described by the specification of a bin mapping function. 726 This function can be generally expressed in terms of an associative 727 array that maps each point in the original space to a bin, although 728 from an implementation standpoint most bin functions are much simpler 729 and more efficient. 731 Since knowledge of the bin mapping function can be used to partially 732 deanonymise binned data, depending on the degree of generalisation, 733 no information about the bin mapping function should be exported. 735 5.4. Permutation 737 Like binning, permutation is described by the specification of a 738 permutation function. In the general case, this can be expressed in 739 terms of an associative array that maps each point in the original 740 space to a point in the anonymised space. Unlike binning, each point 741 in the anonymised space must correspond to a single, unique point in 742 the original space. 744 Since knowledge of the permutation function can be used to completely 745 deanonymise permuted data, no information about the permutation 746 function or its parameters should be exported. 748 5.5. Shift Amount 750 Shifting requires an amount to shift each value by. Since the shift 751 amount can be used to deanonymize data protected by shifting, no 752 information about the shift amount should be exported. 754 6. Anonymisation Export Support in IPFIX 756 Anonymised data exported via IPFIX SHOULD be annotated with 757 anonymisation metadata, which details which fields described by which 758 Templates are anonymised, and provides appropriate information on the 759 anonymisation techniques used. This metadata SHOULD be exported in 760 Data Records described by the recommended Options Templates described 761 in this section; these Options Templates use the additional 762 Information Elements described in the following subsection. 764 Note that fields anonymised using the black-marker (removal) 765 technique do not require any special metadata support. Black-marker 766 anonymised fields SHOULD NOT be exported at all; the absence of the 767 field in a given Data Set is implicitly declared by not including the 768 corresponding Information Element in the Template describing that 769 Data Set; exporting "empty" data elements is inefficient and in the 770 general case impossible, as many non-counter Information Elements do 771 not have semantically distinct null values. 773 6.1. Anonymisation Options Template 775 The Anonymisation Options Template describes anonymisation records, 776 which allow anonymisation metadata to be exported inline over IPFIX 777 or stored in an IPFIX File, by binding information about 778 anonymisation techniques to Information Elements within defined 779 Templates. IPFIX Exporting Processes SHOULD export anonymisation 780 records for any Template describing exported anonymised Data Records; 781 IPFIX Collecting Processes and processes downstream from them MAY use 782 anonymisation records to treat anonymised data differently depending 783 on the applied technique. 785 An Exporting Process SHOULD export anonymisation records after the 786 Templates they describe have been exported, and SHOULD export 787 anonymisation records reliably. 789 Anonymisation records, like Templates, MUST be handled by Collecting 790 Processes as scoped to the Transport Session in which they are sent. 791 While the anonymisationStability IE can be used to declare that a 792 given anonymisation technique's mapping will remain stable across 793 multiple sessions, each session MUST re-export the anonymisation 794 Records along with the templates. 796 [EDITOR'S NOTE: Multiple anon. techniques applied on an IE at the 797 same time is indicated with multiple elements of the same type (in 798 application order as in PSAMP). Need to verify this is actually 799 useful given the defined techniques.] 801 +-------------------------+-----------------------------------------+ 802 | IE | Description | 803 +-------------------------+-----------------------------------------+ 804 | templateId [scope] | The Template ID of the Template | 805 | | containing the Information Element | 806 | | described by this anonymisation record. | 807 | | This Information Element MUST be | 808 | | defined as a Scope Field. | 809 | informationElementId | The Information Element identifier of | 810 | [scope] | the Information Element described by | 811 | | this anonymisation record. This | 812 | | Information Element MUST be defined as | 813 | | a Scope Field. | 814 | informationElementIndex | The Information Element index of the | 815 | [scope] [optional] | instance of the Information Element | 816 | | described by this anonymisation record | 817 | | identified by the informationElementId | 818 | | within the Template. Optional; need | 819 | | only be present when describing | 820 | | Templates that have multiple instances | 821 | | of the same Information Element. This | 822 | | Information Element MUST be defined as | 823 | | a Scope Field if present. This | 824 | | Information Element is defined in | 825 | | Section 6.2, below. | 826 | anonymisationStability | The stability class of the anonymised | 827 | | data. MUST be present. This | 828 | | Information Element is defined in | 829 | | Section 6.2, below. | 830 | anonymisationTechnique | The technique used to anonymise the | 831 | | data. MUST be present. This | 832 | | Information Element is defined in | 833 | | Section 6.2, below. | 834 +-------------------------+-----------------------------------------+ 836 6.2. Recommended Information Elements for Anonymisation Metadata 838 6.2.1. anonymisationStability 840 Description: A description of the stability class of the 841 anonymisation technique applied to a referenced Information 842 Element within a referenced Template. Stability classes refer to 843 the stability of the parameters of the anonymisation technique, 844 and therefore the comparability of the mapping between the real 845 and anonymised values over time. This determines which anonymised 846 datasets may be compared with each other. 848 +-------+-----------------------------------------------------------+ 849 | Value | Description | 850 +-------+-----------------------------------------------------------+ 851 | 0 | Undefined: the Exporting Process makes no representation | 852 | | as to how stable the mapping is, or over what time period | 853 | | values of this field will remain comparable; while the | 854 | | Collecting Process MAY assume Session level stability, | 855 | | Session level stability is not guaranteed. This is | 856 | | equivalent to 0x01 Session level stability while advising | 857 | | the Collecting Process that no special effort has been | 858 | | made to ensure stability. Collecting Processes SHOULD | 859 | | assume this is the case in the absence of stability class | 860 | | information; this is the default stability class. | 861 | 1 | Session: the Exporting Process will ensure that the | 862 | | parameters of the anonymisation technique are stable | 863 | | during the Transport Session. All the values of the | 864 | | described Information Element for each Record described | 865 | | by the referenced Template within the Transport Session | 866 | | are comparable. The Exporting Process SHOULD endeavour | 867 | | to ensure at least this stability class. | 868 | 2 | Exporter-Collector Pair: the Exporting Process will | 869 | | ensure that the parameters of the anonymisation technique | 870 | | are stable across Transport Sessions over time with the | 871 | | given Collecting Process, but may use different | 872 | | parameters for different Collecting Processes. Data | 873 | | exported to different Collecting Processes is not | 874 | | comparable. | 875 | 3 | Stable: the Exporting Process will ensure that the | 876 | | parameters of the anonymisation technique are stable | 877 | | across Transport Sessions over time, regardless of the | 878 | | Collecting Process to which it is sent. | 879 +-------+-----------------------------------------------------------+ 881 Abstract Data Type: unsigned8 883 ElementId: TBD1 885 Status: Proposed 887 6.2.2. anonymisationTechnique 889 Description: A description of the anonymisation technique applied 890 to a referenced Information Element within a referenced Template. 891 Each technique may be applicable only to certain Information 892 Elements and recommended only for certain Infomation Elements; 893 these restrictions are noted in the table below. 895 +-------+--------------------------------+------------+-------------+ 896 | Value | Description | Applicable | Recommended | 897 | | | to | for | 898 +-------+--------------------------------+------------+-------------+ 899 | 0 | Undefined: the Exporting | all | all | 900 | | Process makes no | | | 901 | | representation as to whether | | | 902 | | the defined field is | | | 903 | | anonymised or not. While the | | | 904 | | Collecting Process MAY assume | | | 905 | | that the field is not | | | 906 | | anonymised, it is not | | | 907 | | guaranteed not to be. This is | | | 908 | | the default anonymisation | | | 909 | | technique. | | | 910 | 1 | None: the values exported are | all | all | 911 | | real. | | | 912 | 2 | Precision | all | all | 913 | | Degradation/Truncation: the | | | 914 | | values exported are anonymised | | | 915 | | using simple precision | | | 916 | | degradation or truncation. | | | 917 | | The new precision is implicit | | | 918 | | in the exported data, and can | | | 919 | | be deduced by the Collecting | | | 920 | | Process. | | | 921 | 3 | Binning: the values exported | all | all | 922 | | are anonymised into bins. | | | 923 | 4 | Enumeration: the values | all | timestamps | 924 | | exported are anonymised by | | | 925 | | enumeration. | | | 926 | 5 | Permutation: the values | all | identifiers | 927 | | exported are anonymised by | | | 928 | | random permutation. | | | 929 | 6 | Structured Permutation: the | addresses | | 930 | | values exported are anonymised | | | 931 | | by random permutation, | | | 932 | | preserving bit-level structure | | | 933 | | as appropriate; this | | | 934 | | represents prefix-preserving | | | 935 | | IP address anonymisation or | | | 936 | | structured MAC address | | | 937 | | anonymisation. | | | 938 +-------+--------------------------------+------------+-------------+ 939 Abstract Data Type: unsigned8 941 ElementId: TBD2 943 Status: Proposed 945 6.2.3. informationElementIndex 947 Description: A zero-based index of an Information Element 948 referenced by informationElementId within a Template referenced by 949 templateId; used to disambiguate scope for templates containing 950 multiple identical Information Elements. 952 Abstract Data Type: unsigned16 954 ElementId: TBD3 956 Status: Proposed 958 7. Applying Anonymisation Techniques to IPFIX Export and Storage 960 When exporting or storing anonymised flow data using IPFIX, certain 961 interactions between the IPFIX Protocol and the anonymisation 962 techniques in use must be considered; these are treated in the 963 subsections below. 965 7.1. Arrangement of Processes in IPFIX Anonymisation 967 Anonymisation may be applied to IPFIX data at three stages within a 968 the collection infrastructure: on initial export, at a mediator, or 969 after collection, as shown in Figure 2. Each of these locations has 970 specific considerations and applicability. 972 +==========================================+ 973 | Exporting Process | 974 +==========================================+ 975 | | 976 | (Anonymised at Original Exporter) | 977 V | 978 +=============================+ | 979 | Mediator | | 980 +=============================+ | 981 | | 982 | (Anonymising Mediator) | 983 V V 984 +==========================================+ 985 | Collecting Process | 986 +==========================================+ 987 | 988 | (Anonymising CP/File Writer) 989 V 990 +--------------------+ 991 | IPFIX File Storage | 992 +--------------------+ 994 Figure 2: Potential Anonymisation Locations 996 Anonymisation is generally performed before the wider dissemination 997 or repurposing of a flow data set, e.g., adapting operational 998 measurement data for research. Therefore, direct anonymisation of 999 flow data on initial export is only applicable in certain restricted 1000 circumstances: when the Exporting Process is "publishing" data to a 1001 Collecting Process directly, and the Exporting Process and Collecting 1002 Process are operated by different entities. Note that certain 1003 guidelines in Section 7.2.2 with respect to timestamp anonymisation 1004 may not apply in this case, as the Collecting Process may be able to 1005 deduce certain timing information from the time at which each Message 1006 is received. 1008 A much more flexible arrangement is to anonymise data within a 1009 Mediator [I-D.ietf-ipfix-mediators-framework]. Here, original data 1010 is sent to a Mediator, which performs the anonymisation function and 1011 re-exports the anonymised data. Such a Mediator could be located at 1012 the administrative domain boundary of the initial Exporting Process 1013 operator, exporting anonymised data to other consumers outside the 1014 organisation. In this case, the original Exporter SHOULD use TLS as 1015 specified in [RFC5101] to secure the channel to the Mediator, and the 1016 Mediator should follow the guidelines in Section 7.2, to mitigate the 1017 risk of original data disclosure. 1019 When data is to be published as an anonymised data set in an IPFIX 1020 File [I-D.ietf-ipfix-file], the anonymisation may be done at the 1021 final Collecting Process before storage and dissemination, as well. 1022 In this case, the Collector should follow the guidelines in 1023 Section 7.2, especially as regards File-specific Options in 1024 Section 7.2.3 1026 In each of these data flows, the anonymisation of records is 1027 undertaken by an Intermediate Anonymisation Process (IAP); the data 1028 flows into and out of this IAP are shown in Figure 3 below. 1030 packets --+ +- IPFIX Messages -+ 1031 | | | 1032 V V V 1033 +==================+ +====================+ +=============+ 1034 | Metering Process | | Collecting Process | | File Reader | 1035 +==================+ +====================+ +=============+ 1036 | Non-anonymised | Records | 1037 V V V 1038 +=========================================================+ 1039 | Intermediate Anonymisation Process (IAP) | 1040 +=========================================================+ 1041 | Anonymised ^ Anonymised | 1042 | Records | Records | 1043 V | V 1044 +===================+ Anonymisation +=============+ 1045 | Exporting Process |<--- Parameters ------>| File Writer | 1046 +===================+ +=============+ 1047 | | 1048 +------------> IPFIX Messages <----------+ 1050 Figure 3: Data flows through the anonymisation process 1052 Anonymisation parameters must also be available to the Exporting 1053 Process and/or File Writer in order to ensure header data is also 1054 appropriately anonymised as in Section 7.2.2. 1056 Following each of the data flows through the IAP, we describe five 1057 basic types of anonymisation arrangements within this framework in 1058 Figure 4. In addition to the three arrangements described in detail 1059 above, anonymisation can also be done at a collocated Metering 1060 Process and File Writer (see section 7.3.2 of [I-D.ietf-ipfix-file]), 1061 or at a file manipulator (see section 7.3.7 of 1062 [I-D.ietf-ipfix-file]). 1064 +----+ +-----+ +----+ 1065 pkts -> | MP |->| IAP |->| EP |-> anonymisation on Original Exporter 1066 +----+ +-----+ +----+ 1067 +----+ +-----+ +----+ 1068 pkts -> | MP |->| IAP |->| FW |-> Anonymising collocated MP/File Writer 1069 +----+ +-----+ +----+ 1070 +----+ +-----+ +----+ 1071 IPFIX -> | CP |->| IAP |->| EP |-> Anonymising Mediator (Masquerading Proxy) 1072 +----+ +-----+ +----+ 1073 +----+ +-----+ +----+ 1074 IPFIX -> | CP |->| IAP |->| FW |-> Anonymising collocated CP/File Writer 1075 +----+ +-----+ +----+ 1076 +----+ +-----+ +----+ 1077 IPFIX -> | FR |->| IAP |->| FW |-> Anonymising file manipulator 1078 File +----+ +-----+ +----+ 1080 Figure 4: Possible anonymisation arrangements in the IPFIX 1081 architecture 1083 Note that anonymisation may occur at more than one location within a 1084 given collection infrastructure, to provide varying levels of 1085 anonymisation, disclosure risk, or data utility for specific 1086 purposes. 1088 7.2. IPFIX-Specific Anonymisation Guidelines 1090 In implementing and deploying the anonymisation techniques described 1091 in this document, implementors should note that IPFIX already 1092 provides features that support anonymised data export, and use these 1093 where appropriate. Care must also be taken that data structures 1094 supporting the operation of the protocol itself do not leak data that 1095 could be used to reverse the anonymisation applied to the flow data. 1096 Such data structures may appear in the header, or within the data 1097 stream itself, especially as options data. Each of these and their 1098 impact on specific anonymisation techniques is noted in a separate 1099 subsection below. 1101 7.2.1. Appropriate Use of Information Elements for Anonymised Data 1103 Note, as in Section 6 above, that black-marker anonymised fields 1104 SHOULD NOT be exported at all; the absence of the field in a given 1105 Data Set is implicitly declared by not including the corresponding 1106 Information Element in the Template describing that Data Set. 1108 When using precision degradation of timestamps, Exporting Processes 1109 SHOULD export timing information using Information Elements of an 1110 appropriate precision, as explained in Section 4.5 of [RFC5153]. For 1111 example, timestamps measured in millisecond-level precision and 1112 degraded to second-level precision should use flowStartSeconds and 1113 flowEndSeconds, not flowStartMilliseconds and flowEndMilliseconds. 1115 When exporting anonymised data and anonymisation metadata, Exporting 1116 Processes SHOULD ensure that the combination of Information Element 1117 and declared anonymisation technique are compatible. Specifically, 1118 the applicable and recommended Information Element types and 1119 semantics for each technique are noted in the description of the 1120 anonymisationTechnique Information Element in Section 6.2.2. In this 1121 description, a timestamp is an Information Element with the data type 1122 dateTimeSeconds, dataTimeMilliseconds, dateTimeMicroseconds, or 1123 dateTimeNanoseconds; an address is an Information Element with the 1124 data type ipv4Address, ipv6Address, or macAddress; and an identifier 1125 is an Information Element with identifier data type semantics. 1126 Exporting Process MUST NOT export Anonymisation Options records 1127 binding techniques to Information Elements to which they are not 1128 applicable, and SHOULD NOT export Anonymisation Options records 1129 binding techniques to Information Elements for which they are not 1130 recommended. 1132 7.2.2. Anonymisation of Header Data 1134 Each IPFIX Message contains a Message Header; within this Message 1135 Header are contained two fields which may be used to break certain 1136 anonymisation techniques: the Export Time, and the Observation Domain 1137 ID 1139 Export of IPFIX Messages containing anonymised timestamp data where 1140 the original Export Time Message header has some relationship to the 1141 anonymised timestamps SHOULD anonymise the Export Time header field 1142 using an equivalent technique, if possible. Otherwise, relationships 1143 between export and flow time could be used to partially or totally 1144 reverse timestamp anonymisation. 1146 The similarity in size between an Observation Domain ID and an IPv4 1147 address (32 bits) may lead to a temptation to use an IPv4 interface 1148 address on the Metering or Exporting Process as the Observation 1149 Domain ID. If this address bears some relation to the IP addresses 1150 in the flow data (e.g., shares a network prefix with internal 1151 addresses) and the IP addresses in the flow data are anonymised in a 1152 structure-preserving way, then the Observation Domain ID may be used 1153 to break the IP address anonymisation. Use of an IPv4 interface 1154 address on the Metering or Exporting Process as the Observation 1155 Domain ID is NOT RECOMMENDED in this case. 1157 7.2.3. Anonymisation of Options Data 1159 IPFIX uses the Options mechanism to export, among other things, 1160 metadata about exported flows and the flow collection infrastructure. 1161 As with the IPFIX Message Header, certain Options recommended in 1162 [RFC5101] and the IPFIX File Format [I-D.ietf-ipfix-file] containing 1163 flow timestamps and network addresses of Exporting and Collecting 1164 Processes may be used to break certain anonymisation techniques; care 1165 should be taken while using them with anonymised data export and 1166 storage. 1168 The Exporting Process Reliability Statistics Options Template, 1169 recommended in [RFC5101], contains an Exporting Process ID field, 1170 which may be an exportingProcessIPv4Address Information Element or an 1171 exportingProcessIPv6Address Information Element. If the Exporting 1172 Process address bears some relation to the IP addresses in the flow 1173 data (e.g., shares a network prefix with internal addresses) and the 1174 IP addresses in the flow data are anonymised in a structure- 1175 preserving way, then the Exporting Process address may be used to 1176 break the IP address anonymisation. Exporting Processes exporting 1177 anonymised data in this situation SHOULD mitigate the risk of attack 1178 either by omitting Options described by the Exporting Process 1179 Reliability Statistics Options Template, or by anonymising the 1180 Exporting Process address using a similar technique to that used to 1181 anonymise the IP addresses in the exported data. 1183 Similarly, the Export Session Details Options Template and Message 1184 Details Options Template specified for the IPFIX File Format 1185 [I-D.ietf-ipfix-file] may contain the exportingProcessIPv4Address 1186 Information Element or the exportingProcessIPv6Address Information 1187 Element to identify an Exporting Process from which a flow record was 1188 received, and the collectingProcessIPv4Address Information Element or 1189 the collectingProcessIPv6Address Information Element to identify the 1190 Collecting Process which received it. If the Exporting Process or 1191 Collecting Process address bears some relation to the IP addresses in 1192 the flow data (e.g., shares a network prefix with internal addresses) 1193 and the IP addresses in the flow data are anonymised in a structure- 1194 preserving way, then the Exporting Process or Collecting Process 1195 address may be used to break the IP address anonymisation. Since 1196 these Options Templates are primarily intended for storing IPFIX 1197 Transport Session data for auditing, replay, and testing purposes, it 1198 is NOT RECOMMENDED that storage of anonymised data include these 1199 Options Templates in order to mitigate the risk of attack. 1201 The Message Details Options Template specified for the IPFIX File 1202 Format [I-D.ietf-ipfix-file] also contains the 1203 collectionTimeMilliseconds Information Element. As with the Export 1204 Time Message Header field, if the exported flow data contains 1205 anonymised timestamp information, and the collectionTimeMilliseconds 1206 Information Element in a given Message has some relationship to the 1207 anonymised timestamp information, then this relationship can be 1208 exploited to reverse the timestamp anonymisation. Since this Options 1209 Template is primarily intended for storing IPFIX Transport Session 1210 data for auditing, replay, and testing purposes, it is NOT 1211 RECOMMENDED that storage of anonymised data include this Options 1212 Template in order to mitigate the risk of attack. 1214 Since the Time Window Options Template specified for the IPFIX File 1215 Format [I-D.ietf-ipfix-file] refers to the timestamps within the flow 1216 data to provide partial table of contents information for an IPFIX 1217 File, care must be taken to ensure that Options described by this 1218 template are written using the anonymised timestamps instead of the 1219 original ones. 1221 8. Examples 1223 [TODO: write this section.] 1225 9. Security Considerations 1227 [TODO: write this section.] 1229 10. IANA Considerations 1231 This document specifies the creation of several new IPFIX Information 1232 Elements in the IPFIX Information Element registry located at 1233 http://www.iana.org/assignments/ipfix, as defined in Section 6.2 1234 above. IANA has assigned the following Information Element numbers 1235 for their respective Information Elements as specified below: 1237 o Information Element number TBD1 for the anonymisationStability 1238 Information Element. 1240 o Information Element number TBD2 for the anonymisationTechnique 1241 Information Element. 1243 o Information Element number TBD3 for the informationElementIndex 1244 Information Element. 1246 [NOTE for IANA: The text TBDn should be replaced with the respective 1247 assigned Information Element numbers where they appear in this 1248 document.] 1250 11. Acknowledgments 1252 We thank Paul Aitken for his comments and insight, and the PRISM 1253 project for its support of this work. 1255 12. References 1257 12.1. Normative References 1259 [RFC5101] Claise, B., "Specification of the IP Flow Information 1260 Export (IPFIX) Protocol for the Exchange of IP Traffic 1261 Flow Information", RFC 5101, January 2008. 1263 [RFC5102] Quittek, J., Bryant, S., Claise, B., Aitken, P., and J. 1264 Meyer, "Information Model for IP Flow Information Export", 1265 RFC 5102, January 2008. 1267 12.2. Informative References 1269 [RFC5472] Zseby, T., Boschi, E., Brownlee, N., and B. Claise, "IP 1270 Flow Information Export (IPFIX) Applicability", RFC 5472, 1271 March 2009. 1273 [RFC5470] Sadasivan, G., Brownlee, N., Claise, B., and J. Quittek, 1274 "Architecture for IP Flow Information Export", RFC 5470, 1275 March 2009. 1277 [I-D.ietf-ipfix-file] 1278 Trammell, B., Boschi, E., Mark, L., Zseby, T., and A. 1279 Wagner, "Specification of the IPFIX File Format", 1280 draft-ietf-ipfix-file-04 (work in progress), July 2009. 1282 [I-D.ietf-ipfix-mediators-framework] 1283 Kobayashi, A., Nishida, H., and B. Claise, "IPFIX 1284 Mediation: Framework", 1285 draft-ietf-ipfix-mediators-framework-02 (work in 1286 progress), February 2009. 1288 [I-D.ietf-ipfix-mediators-problem-statement] 1289 Kobayashi, A., Claise, B., Nishida, H., Sommer, C., 1290 Dressler, F., and E. Stephan, "IPFIX Mediation: Problem 1291 Statement", 1292 draft-ietf-ipfix-mediators-problem-statement-03 (work in 1293 progress), April 2009. 1295 [RFC5153] Boschi, E., Mark, L., Quittek, J., Stiemerling, M., and P. 1296 Aitken, "IP Flow Information Export (IPFIX) Implementation 1297 Guidelines", RFC 5153, April 2008. 1299 [RFC3917] Quittek, J., Zseby, T., Claise, B., and S. Zander, 1300 "Requirements for IP Flow Information Export (IPFIX)", 1301 RFC 3917, October 2004. 1303 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1304 Requirement Levels", BCP 14, RFC 2119, March 1997. 1306 Authors' Addresses 1308 Elisa Boschi 1309 Hitachi Europe 1310 c/o ETH Zurich 1311 Gloriastrasse 35 1312 8092 Zurich 1313 Switzerland 1315 Phone: +41 44 632 70 57 1316 Email: elisa.boschi@hitachi-eu.com 1318 Brian Trammell 1319 Hitachi Europe 1320 c/o ETH Zurich 1321 Gloriastrasse 35 1322 8092 Zurich 1323 Switzerland 1325 Phone: +41 44 632 70 13 1326 Email: brian.trammell@hitachi-eu.com