idnits 2.17.1 draft-boschi-ipfix-anon-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 12, 2009) is 5576 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 5101 (Obsoleted by RFC 7011) ** Obsolete normative reference: RFC 5102 (Obsoleted by RFC 7012) == Outdated reference: A later version (-05) exists of draft-ietf-ipfix-file-03 == Outdated reference: A later version (-09) exists of draft-ietf-ipfix-mediators-framework-01 Summary: 2 errors (**), 0 flaws (~~), 4 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IPFIX Working Group E. Boschi 3 Internet-Draft B. Trammell 4 Intended status: Experimental Hitachi Europe 5 Expires: July 16, 2009 January 12, 2009 7 IP Flow Anonymisation Support 8 draft-boschi-ipfix-anon-02.txt 10 Status of this Memo 12 This Internet-Draft is submitted to IETF in full conformance with the 13 provisions of BCP 78 and BCP 79. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at any 22 time. It is inappropriate to use Internet-Drafts as reference 23 material or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt. 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 This Internet-Draft will expire on July 16, 2009. 33 Copyright Notice 35 Copyright (c) 2009 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents 40 (http://trustee.ietf.org/license-info) in effect on the date of 41 publication of this document. Please review these documents 42 carefully, as they describe your rights and restrictions with respect 43 to this document. 45 Abstract 47 This document describes anonymisation techniques for IP flow data and 48 the export of anonymised data using the IPFIX protocol. It provides 49 a categorization of common anonymisation schemes and defines the 50 parameters needed to describe them. It provides guidelines for the 51 implementation of anonymised data export and storage over IPFIX, and 52 describes an Options-based method for anonymization metadata export 53 within the IPFIX protocol, providing the basis for the definition of 54 information models for configuring anonymisation techniques within an 55 IPFIX Metering or Exporting Process, and for reporting the technique 56 in use to an IPFIX Collecting Process. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 61 1.1. IPFIX Protocol Overview . . . . . . . . . . . . . . . . . 3 62 1.2. IPFIX Documents Overview . . . . . . . . . . . . . . . . . 3 63 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 64 3. Categorisation of Anonymisation Techniques . . . . . . . . . . 4 65 4. Anonymisation of IP Flow Data . . . . . . . . . . . . . . . . 6 66 4.1. IP Address Anonymisation . . . . . . . . . . . . . . . . . 7 67 4.1.1. Truncation . . . . . . . . . . . . . . . . . . . . . . 7 68 4.1.2. Random Permutation . . . . . . . . . . . . . . . . . . 7 69 4.1.3. Prefix-preserving Pseudonymisation . . . . . . . . . . 7 70 4.2. Timestamp Anonymisation . . . . . . . . . . . . . . . . . 8 71 4.2.1. Precision Degradation . . . . . . . . . . . . . . . . 8 72 4.2.2. Enumeration . . . . . . . . . . . . . . . . . . . . . 8 73 4.2.3. Random Time Shifts . . . . . . . . . . . . . . . . . . 8 74 4.3. Counter Anonymisation . . . . . . . . . . . . . . . . . . 8 75 4.3.1. Precision Degradation . . . . . . . . . . . . . . . . 9 76 4.3.2. Binning . . . . . . . . . . . . . . . . . . . . . . . 9 77 4.3.3. Random Noise Addition . . . . . . . . . . . . . . . . 9 78 4.4. Anonymisation of Other Flow Fields . . . . . . . . . . . . 9 79 5. Applying Anonymisation Techniques to IPFIX Export and 80 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 81 5.1. Arrangement of Processes in IPFIX Anonymisation . . . . . 10 82 5.2. IPFIX-Specific Anonymisation Guidelines . . . . . . . . . 11 83 5.2.1. Anonymisation of Header Data . . . . . . . . . . . . . 11 84 5.2.2. Anonymisation of Options Data . . . . . . . . . . . . 12 85 6. Parameters for the Description of Anonymisation Techniques . . 13 86 7. Anonymisation Metadata Support in IPFIX . . . . . . . . . . . 13 87 8. Security Considerations . . . . . . . . . . . . . . . . . . . 14 88 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 89 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 14 90 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14 91 11.1. Normative References . . . . . . . . . . . . . . . . . . . 14 92 11.2. Informative References . . . . . . . . . . . . . . . . . . 14 93 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15 95 1. Introduction 97 The standardisation of an IP flow information export protocol 98 [RFC5101] and associated representations removes a technical barrier 99 to the sharing of IP flow data across organizational boundaries and 100 with network operations, security, and research communities for a 101 wide variety of purposes. However, with wider dissemination comes 102 greater risks to the privacy of the users of networks under 103 measurement, and to the security of those networks. While it is not 104 a complete solution to the issues posed by distribution of IP flow 105 information, anonymisation is an important tool for the protection of 106 privacy within network measurement infrastructures. 108 This document presents a mechanism for representing anonymised data 109 within IPFIX and guidelines for using it. It begins with a 110 categorization of anonymisation techniques. It then describes 111 applicability of each technique to commonly anonymisable fields of IP 112 flow data, organized by information element data type and semantics 113 as in [RFC5102]; enumerates the parameters required by each of the 114 applicable anonymisation techniques; and provides guidelines for the 115 use of each of these techniques in accordance with best practices in 116 data protection. Finally, it specifies a mechanism for exporting 117 anonymised data and binding anonymisation metadata to templates using 118 IPFIX Options. 120 1.1. IPFIX Protocol Overview 122 In the IPFIX protocol, { type, length, value } tuples are expressed 123 in templates containing { type, length } pairs, specifying which { 124 value } fields are present in data records conforming to the 125 Template, giving great flexibility as to what data is transmitted. 126 Since Templates are sent very infrequently compared with Data 127 Records, this results in significant bandwidth savings. Various 128 different data formats may be transmitted simply by sending new 129 Templates specifying the { type, length } pairs for the new data 130 format. See [RFC5101] for more information. 132 The IPFIX information model [RFC5102] defines a large number of 133 standard Information Elements which provide the necessary { type } 134 information for Templates. The use of standard elements enables 135 interoperability among different vendors' implementations. 136 Additionally, non-standard enterprise-specific elements may be 137 defined for private use. 139 1.2. IPFIX Documents Overview 141 "Specification of the IPFIX Protocol for the Exchange of IP Traffic 142 Flow Information" [RFC5101] and its associated documents define the 143 IPFIX Protocol, which provides network engineers and administrators 144 with access to IP traffic flow information. 146 "Architecture for IP Flow Information Export" 147 [I-D.ietf-ipfix-architecture] defines the architecture for the export 148 of measured IP flow information out of an IPFIX Exporting Process to 149 an IPFIX Collecting Process, and the basic terminology used to 150 describe the elements of this architecture, per the requirements 151 defined in "Requirements for IP Flow Information Export" [RFC3917]. 152 The IPFIX Protocol document [RFC5101] then covers the details of the 153 method for transporting IPFIX Data Records and Templates via a 154 congestion-aware transport protocol from an IPFIX Exporting Process 155 to an IPFIX Collecting Process. 157 "Information Model for IP Flow Information Export" [RFC5102] 158 describes the Information Elements used by IPFIX, including details 159 on Information Element naming, numbering, and data type encoding. 160 Finally, "IPFIX Applicability" [I-D.ietf-ipfix-as] describes the 161 various applications of the IPFIX protocol and their use of 162 information exported via IPFIX, and relates the IPFIX architecture to 163 other measurement architectures and frameworks. 165 Additionally, the "Specification of the IPFIX File Format" 166 [I-D.ietf-ipfix-file] describes a file format based upon the IPFIX 167 Protocol for the storage of flow data. 169 This document references the Protocol and Architecture documents for 170 terminology, and extends the IPFIX Information Model to provide new 171 Information Elements for anonymisation metadata. The anonymisation 172 techniques described herein are equally applicable to the IPFIX 173 Protocol and data stored in IPFIX Files. 175 2. Terminology 177 Terms used in this document that are defined in the Terminology 178 section of the IPFIX Protocol [RFC5101] document are to be 179 interpreted as defined there. 181 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 182 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 183 document are to be interpreted as described in RFC 2119 [RFC2119]. 185 3. Categorisation of Anonymisation Techniques 187 Anonymisation modifies a data set in order to protect the identity of 188 the people or entities described by the data set from disclosure. 190 With respect to network traffic data, anonymisation generally 191 attempts to preserve some set of properties of the network traffic 192 useful for a given application or applications, while ensuring the 193 data cannot be traced back to the specific networks, hosts, or users 194 generating the traffic. 196 Anonymisation may be broadly classified according to two properties: 197 recoverability and countability. All anonymisation techniques map 198 the real space of identifiers or values into a separate, anonymised 199 space, according to some function. A technique is said to be 200 recoverable when the function used is invertible or can otherwise be 201 reversed and a real identifier can be recovered from a given 202 replacement identifier. 204 Countability compares the dimension of the anonymised space (N) to 205 the dimension of the real space (M), and denotes how the count of 206 unique values is preserved by the anonymisation function. If the 207 anonymised space is smaller than the real space, then the function is 208 said to generalise the input, mapping more than one input point to 209 each anonymous value (e.g., as with aggregation). By definition, 210 generalisation is not recoverable. 212 If the dimensions of the anonymised and real spaces are the same, 213 such that the count of unique values is preserved, then the function 214 is said to be a direct substitution function. If the dimension of 215 the anonymised space is larger, such that each real value maps to a 216 set of anonymised values, then the function is said to be a set 217 substitution function. Note that with set substitution functions, 218 the sets of anonymised values are not necessarily disjoint. Either 219 direct or set substitution functions are said to be one-way if there 220 exists no method for recovering the real data point from an 221 anonymised one. 223 This classification is summarised in the table below. 225 +------------------------+-----------------+------------------------+ 226 | Recoverability / | Recoverable | Non-recoverable | 227 | Countability | | | 228 +------------------------+-----------------+------------------------+ 229 | N < M | N.A. | Generalisation | 230 | N = M | Direct | One-way Direct | 231 | | Substitution | Substitution | 232 | N > M | Set | One-way Set | 233 | | Substitution | Substitution | 234 +------------------------+-----------------+------------------------+ 236 4. Anonymisation of IP Flow Data 238 Due to the restricted semantics of IP flow data, there are a 239 relatively limited set of specific anonymisation techniques available 240 on flow data, though each falls into the broad categories above. 241 Each type of field that may commonly appear in a flow record may have 242 its own applicable specific techniques. 244 While anonymisation is generally applied at the resolution of single 245 fields within a flow record, attacks against anonymisation use entire 246 flows and relationships between hosts and flows within a given data 247 set. Therefore, fields which may not necessarily be identifying by 248 themselves may be anonymised in order to increase the anonymity of 249 the data set as a whole. 251 Of all the fields in an IP flow record, only IP addresses directly 252 identify entities in the real world. Each IP address is associated 253 with an interface on a network host, and can potentially be 254 identified with a single user. Additionally, IP addresses are 255 structured identifiers; that is, partial IP address prefixes may be 256 used to identify networks just as full IP addresses identify hosts. 257 This makes anonymisation of IP addresses particularly important. 259 Port numbers identify abstract entities (applications) as opposed to 260 real-world entities, but they can be used to classify hosts and user 261 behavior. Passive port fingerprinting, both of well-known and 262 ephemeral ports, can be used to determine the operating system 263 running on a host. Relative data volumes by port can also be used to 264 determine the host's function (workstation, web server, etc.); this 265 information can be used to identify hosts and users. 267 While not identifiers in and of themselves, timestamps and counters 268 can reveal the behavior of the hosts and users on a network. Any 269 given network activity is recognizable by a pattern of relative time 270 differences and data volumes in the associated sequence of flows, 271 even without host address information. They can therefore be used to 272 identify hosts and users. Timestamps and counters are also 273 vulnerable to traffic injection attacks, where traffic with a known 274 pattern is injected into a network under measurement, and this 275 pattern is later identified in the anonymised data set. 277 The simplest and most extreme form of anonymisation, which can be 278 applied to any field of a flow record, is black-marker anonymisation, 279 or complete deletion of a given field. Note that black-marker 280 anonymisation is equivalent to simply not exporting the field(s) in 281 question. 283 While black-marker anonymisation completely protects the data in the 284 deleted fields from the risk of disclosure, it also reduces the 285 utility of the anonymised data set as a whole. Techniques that 286 retain some information while reducing (though not eliminating) the 287 disclosure risk will be extensively discussed in the following 288 sections; note that the techniques specifically applicable to IP 289 addresses, timestamps, and counters will be discussed in separate 290 sections. 292 4.1. IP Address Anonymisation 294 The following table gives an overview of the schemes for IP address 295 anonymization described in this document and their categorization. 297 +-------------------------------+-------------------+---------------+ 298 | Scheme | Action | Reversibility | 299 +-------------------------------+-------------------+---------------+ 300 | Truncation | Generalisation | N | 301 | Random Permutation | Direct | Y/N | 302 | | Substitution | | 303 | Prefix-preserving | Direct | Y | 304 | Pseudonymisation | Substitution | | 305 +-------------------------------+-------------------+---------------+ 307 Note that random permutations might be either reversible or not, 308 depending on the function used. 310 4.1.1. Truncation 312 Truncation removes "n" of the least significant bits from an IP 313 address. Note that truncating 8 bits would replace an IP address 314 with the corresponding class C network address. 316 4.1.2. Random Permutation 318 Random permutation replaces each IP address with a unique address 319 randomply selected from the set of possible IP addresses. The 320 permutation function is implementable using a hash table to ensure 321 uniqueness. 323 4.1.3. Prefix-preserving Pseudonymisation 325 Prefix-preserving pseudonymisation preserves the structure of subnets 326 at each level while anonymising IP addresses. If two real IP 327 addresses match on a prefix of "n" bits, the two anonymised IP 328 addresses will match on a prefix of "n" bits as well. 330 4.2. Timestamp Anonymisation 332 [TODO: introductory text] 334 +-----------------------+---------------------------+---------------+ 335 | Scheme | Action | Reversibility | 336 +-----------------------+---------------------------+---------------+ 337 | Precision Degradation | Generalisation | N | 338 | Enumeration | Direct or Set | Y | 339 | | Substitution | | 340 | Random Shifts | Direct Substitution | Y | 341 +-----------------------+---------------------------+---------------+ 343 4.2.1. Precision Degradation 345 Precision Degradation removes the most precise components of a 346 timestamp, accounting all events occurring in each given interval 347 (e.g. one millisecond for millisecond level degradation) as 348 simultaneous. This has the effect of potentially collapsing many 349 timestamps into one. With this technique time precision is reduced, 350 and sequencing may be lost, but the information at which time the 351 event occurred is preserved. 353 4.2.2. Enumeration 355 Enumeration keeps the chronological order in which events occurred 356 while eliminating time information. Timestamps are substituted by 357 equidistant timestamps (or numbers) starting from an randomly chosen 358 start value. 360 4.2.3. Random Time Shifts 362 Random Time Shifts keep the information on how far apart two events 363 are from each other. This is achieved by shifting all timestamps by 364 the same random number. Note that random time shifts also preserve 365 chronological order. 367 4.3. Counter Anonymisation 369 Counters (such as packet and octet volumes per flow) are subject to 370 fingerprinting and injection attacks against anonymisation, as 371 timestamps are, but relative magnitudes of activity can be useful for 372 certain analysis tasks. [TODO: more intro text] 373 +-----------------------+---------------------------+---------------+ 374 | Scheme | Action | Reversibility | 375 +-----------------------+---------------------------+---------------+ 376 | Precision Degradation | Generalisation | N | 377 | Binning | Generalisation | N | 378 | Random noise addition | Direct or Set | N | 379 | | Substitution | | 380 +-----------------------+---------------------------+---------------+ 382 4.3.1. Precision Degradation 384 As with precision degradation in timestamps, precision degradation of 385 counters removes lower-order bits of the counters, treating all the 386 counters in a given range as having the same value. Depending on the 387 precision reduction, this loses information about the relationships 388 between sizes of similarly-sized flows, but keeps relative magnitude 389 information. 391 4.3.2. Binning 393 Binning can be seen as a special case of precision degradation; the 394 operation is identical, except for in precision degradation the 395 counter ranges are uniform, and in binning they need not be. For 396 example, a common counter binning scheme for packet counters could be 397 to bin values 1-2 together, and 3-infinity together, thereby 398 separating potentially completely-opened TCP connections from 399 unopened ones. Binning schemes are generally chosen to keep 400 precisely the amount of information required in a counter for a given 401 analysis task 403 4.3.3. Random Noise Addition 405 Random noise addition adds a random amount to a counter in each flow; 406 this is used to keep relative magnitude information and minimize the 407 disruption to size relationship information while avoiding 408 fingerprinting attacks against anonymization. 410 4.4. Anonymisation of Other Flow Fields 412 [TODO: as section 4.1] 414 5. Applying Anonymisation Techniques to IPFIX Export and Storage 416 When exporting or storing anonymised flow data using IPFIX, certain 417 interactions between the IPFIX Protocol and the anonymisation 418 techniques in use must be considered; these are treated in the 419 subsections below. 421 5.1. Arrangement of Processes in IPFIX Anonymisation 423 Anonymisation may be applied to IPFIX data at three stages within a 424 the collection infrastructure: on initial export, at a mediator, or 425 after collection, as shown in Figure 1. Each of these locations has 426 specific considerations and applicability. 428 +--------------------+ 429 | IPFIX File Storage | 430 +--------------------+ 431 ^ 432 | (Anonymised after collection) 433 | 434 +=======================================+ 435 | Collecting Process | 436 +=======================================+ 437 ^ ^ 438 | (Anonymised at mediator) | 439 | | 440 +=============================+ | 441 | Mediator | | 442 +=============================+ | 443 ^ | 444 | (Anonymised on initial export) | 445 | | 446 +=======================================+ 447 | Exporting Process | 448 +=======================================+ 450 Figure 1: Potential Anonymisation Locations 452 Anonymisation is generally performed before the wider dissemination 453 or repurposing of a flow data set, e.g., adapting operational 454 measurement data for research. Therefore, direct anonymisation of 455 flow data on initial export is only applicable in certain restricted 456 circumstances: when the Exporting Process is "publishing" data to a 457 Collecting Process directly, and the Exporting Process and Collecting 458 Process are operated by different entities. Note that certain 459 guidelines in Section 5.2.1 with respect to timestamp anonymisation 460 may not apply in this case, as the Collecting Process may be able to 461 deduce certain timing information from the time at which each Message 462 is received. 464 A much more flexible arrangement is to anonymise data within a 465 Mediator [I-D.ietf-ipfix-mediators-framework]. Here, original data 466 is sent to a Mediator, which performs the anonymisation function and 467 re-exports the anonymised data. Such a Mediator could be located at 468 the administrative domain boundary of the initial Exporting Process 469 operator, exporting anonymised data to other consumers outside the 470 organisation. In this case, the original Exporter SHOULD use TLS as 471 specified in [RFC5101] to secure the channel to the Mediator, and the 472 Mediator should follow the guidelines in Section 5.2, to mitigate the 473 risk of original data disclosure. 475 When data is to be published as an anonymised data set in an IPFIX 476 File [I-D.ietf-ipfix-file], the anonymisation may be done at the 477 final Collecting Process before storage and dissemination, as well. 478 In this case, the Collector should follow the guidelines in 479 Section 5.2, especially as regards File-specific Options in 480 Section 5.2.2 482 Note that anonymisation may occur at more than one location within a 483 given collection infrastructure, to provide varying levels of 484 anonymisation reversal risk and utility for specific purposes. 486 5.2. IPFIX-Specific Anonymisation Guidelines 488 In implementing and deploying the anonymisation techniques described 489 in this document, care must be taken that data structures supporting 490 the operation of the protocol itself do not leak data that could be 491 used to reverse the anonymisation applied to the flow data. Such 492 data structures may appear in the header, or within the data stream 493 itself, especially as options data. Each of these and their impact 494 on specific anonymisation techniques is noted in a separate 495 subsection below. 497 5.2.1. Anonymisation of Header Data 499 Each IPFIX Message contains a Message Header; within this Message 500 Header are contained two fields which may be used to break certain 501 anonymisation techniques: the Export Time, and the Observation Domain 502 ID 504 Export of IPFIX Messages containing anonymised timestamp data where 505 the original Export Time Message header has some relationship to the 506 anonymised timestamps SHOULD anonymise the Export Time header field 507 using an equivalent technique, if possible. Otherwise, relationships 508 between export and flow time could be used to partially or totally 509 reverse timestamp anonymisation. 511 The similarity in size between an Observation Domain ID and an IPv4 512 address (32 bits) may lead to a temptation to use an IPv4 interface 513 address on the Metering or Exporting Process as the Observation 514 Domain ID. If this address bears some relation to the IP addresses 515 in the flow data (e.g., shares a network prefix with internal 516 addresses) and the IP addresses in the flow data are anonymised in a 517 structure-preserving way, then the Observation Domain ID may be used 518 to break the IP address anonymisation. Use of an IPv4 interface 519 address on the Metering or Exporting Process as the Observation 520 Domain ID is NOT RECOMMENDED in this case. 522 [EDITOR'S NOTE: We might want to see if anyone is actually doing this 523 with IPFIX. The example comes from other network measurement tools 524 (e.g. Argus) which default to using an IPv4 address as a sensor ID.] 526 5.2.2. Anonymisation of Options Data 528 IPFIX uses the Options mechanism to export, among other things, 529 metadata about exported flows and the flow collection infrastructure. 530 As with the IPFIX Message Header, certain Options recommended in 531 [RFC5101] and the IPFIX File Format [I-D.ietf-ipfix-file] containing 532 flow timestamps and network addresses of Exporting and Collecting 533 Processes may be used to break certain anonymisation techniques; care 534 should be taken while using them with anonymised data export and 535 storage. 537 The Exporting Process Reliability Statistics Options Template, 538 recommended in [RFC5101], contains an Exporting Process ID field, 539 which may be an exportingProcessIPv4Address Information Element or an 540 exportingProcessIPv6Address Information Element. If the Exporting 541 Process address bears some relation to the IP addresses in the flow 542 data (e.g., shares a network prefix with internal addresses) and the 543 IP addresses in the flow data are anonymised in a structure- 544 preserving way, then the Exporting Process address may be used to 545 break the IP address anonymisation. Exporting Processes exporting 546 anonymised data in this situation SHOULD mitigate the risk of attack 547 either by omitting Options described by the Exporting Process 548 Reliability Statistics Options Template, or by anonymising the 549 Exporting Process address using a similar technique to that used to 550 anonymise the IP addresses in the exported data. 552 Similarly, the Export Session Details Options Template and Message 553 Details Options Template specified for the IPFIX File Format 554 [I-D.ietf-ipfix-file] may contain the exportingProcessIPv4Address 555 Information Element or the exportingProcessIPv6Address Information 556 Element to identify an Exporting Process from which a flow record was 557 received, and the collectingProcessIPv4Address Information Element or 558 the collectingProcessIPv6Address Information Element to identify the 559 Collecting Process which received it. If the Exporting Process or 560 Collecting Process address bears some relation to the IP addresses in 561 the flow data (e.g., shares a network prefix with internal addresses) 562 and the IP addresses in the flow data are anonymised in a structure- 563 preserving way, then the Exporting Process or Collecting Process 564 address may be used to break the IP address anonymisation. Since 565 these Options Templates are primarily intended for storing IPFIX 566 Transport Session data for auditing, replay, and testing purposes, it 567 is NOT RECOMMENDED that storage of anonymised data include these 568 Options Templates in order to mitigate the risk of attack. 570 The Message Details Options Template specified for the IPFIX File 571 Format [I-D.ietf-ipfix-file] also contains the 572 collectionTimeMilliseconds Information Element. As with the Export 573 Time Message Header field, if the exported flow data contains 574 anonymised timestamp information, and the collectionTimeMilliseconds 575 Information Element in a given Message has some relationship to the 576 anonymised timestamp information, then this relationship can be 577 exploited to reverse the timestamp anonymisation. Since this Options 578 Template is primarily intended for storing IPFIX Transport Session 579 data for auditing, replay, and testing purposes, it is NOT 580 RECOMMENDED that storage of anonymised data include this Options 581 Template in order to mitigate the risk of attack. 583 Since the Time Window Options Template specified for the IPFIX File 584 Format [I-D.ietf-ipfix-file] refers to the timestamps within the flow 585 data to provide partial table of contents information for an IPFIX 586 File, care must be taken to ensure that Options described by this 587 template are written using the anonymised timestamps instead of the 588 original ones. 590 6. Parameters for the Description of Anonymisation Techniques 592 [TODO: see corresponding section of draft-ietf-psamp-sample-tech for 593 the proposed structure of this section.] 595 7. Anonymisation Metadata Support in IPFIX 597 [TODO: Here we'll describe how the information specified above can be 598 transmitted on the wire using an option template. The idea is to 599 scope the option to the Template ID and for each field specify which 600 are anonymised, providing info on the output characteristics of the 601 technique, and which ones aren't.] 603 [EDITOR'S NOTE: Multiple anon. techniques applied on an IE at the 604 same time is indicated with multiple elements of the same type (in 605 application order as in PSAMP)] 607 [EDITOR'S NOTE: for blackmarking we'll recommend not to export the 608 information at all following the data protection law principle that 609 only necessary information should be exported.] 611 8. Security Considerations 613 [TODO: write this section.] 615 9. IANA Considerations 617 This document contains no actions for IANA. 619 10. Acknowledgments 621 We thank Paul Aitken for his comments and insight, and the PRISM 622 project for its support of this work. 624 11. References 626 11.1. Normative References 628 [RFC5101] Claise, B., "Specification of the IP Flow Information 629 Export (IPFIX) Protocol for the Exchange of IP Traffic 630 Flow Information", RFC 5101, January 2008. 632 [RFC5102] Quittek, J., Bryant, S., Claise, B., Aitken, P., and J. 633 Meyer, "Information Model for IP Flow Information Export", 634 RFC 5102, January 2008. 636 11.2. Informative References 638 [I-D.ietf-ipfix-as] 639 Zseby, T., "IPFIX Applicability", draft-ietf-ipfix-as-12 640 (work in progress), July 2007. 642 [I-D.ietf-ipfix-architecture] 643 Sadasivan, G., "Architecture for IP Flow Information 644 Export", draft-ietf-ipfix-architecture-12 (work in 645 progress), September 2006. 647 [I-D.ietf-ipfix-file] 648 Trammell, B., Boschi, E., Mark, L., Zseby, T., and A. 649 Wagner, "Specification of the IPFIX File Format", 650 draft-ietf-ipfix-file-03 (work in progress), October 2008. 652 [I-D.ietf-ipfix-mediators-framework] 653 Kobayashi, A., Nishida, H., and B. Claise, "IPFIX 654 Mediation: Framework", 655 draft-ietf-ipfix-mediators-framework-01 (work in 656 progress), November 2008. 658 [RFC3917] Quittek, J., Zseby, T., Claise, B., and S. Zander, 659 "Requirements for IP Flow Information Export (IPFIX)", 660 RFC 3917, October 2004. 662 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 663 Requirement Levels", BCP 14, RFC 2119, March 1997. 665 Authors' Addresses 667 Elisa Boschi 668 Hitachi Europe 669 c/o ETH Zurich 670 Gloriastrasse 35 671 8092 Zurich 672 Switzerland 674 Phone: +41 44 632 70 57 675 Email: elisa.boschi@hitachi-eu.com 677 Brian Trammell 678 Hitachi Europe 679 c/o ETH Zurich 680 Gloriastrasse 35 681 8092 Zurich 682 Switzerland 684 Phone: +41 44 632 70 13 685 Email: brian.trammell@hitachi-eu.com