idnits 2.17.1 draft-ietf-marf-redaction-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 27, 2012) is 4466 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 MARF Working Group J. Falk, Ed. 3 Internet-Draft Return Path 4 Intended status: Standards Track M. Kucherawy, Ed. 5 Expires: July 30, 2012 Cloudmark 6 January 27, 2012 8 Redaction of Potentially Sensitive Data from Mail Abuse Reports 9 draft-ietf-marf-redaction-08 11 Abstract 13 Email messages often contain information that might be considered 14 private or sensitive, per either regulation or social norms. When 15 such a message becomes the subject of a report intended to be shared 16 with other entities, the report generator may wish to redact or elide 17 the sensitive portions of the message. This memo suggests one method 18 for doing so effectively. 20 Status of this Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on July 30, 2012. 37 Copyright Notice 39 Copyright (c) 2012 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 55 2. Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 56 3. Recommended Practice . . . . . . . . . . . . . . . . . . . . . 3 57 4. Transformation Mechanisms . . . . . . . . . . . . . . . . . . . 4 58 5. Security Considerations . . . . . . . . . . . . . . . . . . . . 5 59 5.1. General . . . . . . . . . . . . . . . . . . . . . . . . . . 5 60 5.2. Digest Collisions . . . . . . . . . . . . . . . . . . . . . 5 61 5.3. Information Not Redacted . . . . . . . . . . . . . . . . . 5 62 6. Privacy Considerations . . . . . . . . . . . . . . . . . . . . 6 63 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6 64 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 6 65 8.1. Normative References . . . . . . . . . . . . . . . . . . . 6 66 8.2. Informative References . . . . . . . . . . . . . . . . . . 6 67 Appendix A. Example . . . . . . . . . . . . . . . . . . . . . . . 7 68 Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . . 8 69 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 8 71 1. Introduction 73 [ARF] defines a message format for sending reports of abuse in the 74 messaging infrastructure, with an eye toward automating both the 75 generating and consumption of those reports. 77 For privacy considerations it might be the policy of a report 78 generator to anonymize, or obscure, portions of the report that might 79 identify an end user who caused the report to be generated. This has 80 come to be known in feedback loop parlance as "redaction". Precisely 81 how this is done is unspecified in [ARF] as it will generally be a 82 matter of local policy. That specification does admonish generators 83 against being too over-zealous with this practice, as obscuring too 84 much data makes the report non-actionable. 86 Previous redaction practices, such as replacing local-parts of 87 addresses with a uniform string like "xxxxxxxx", frustrated any kind 88 of prioritizing or grouping of reports. This memo presents a 89 practice for conducting redaction in a manner that allows a report 90 receiver to detect that two reports were caused by the same end user 91 without revealing the identify of that user. That is, the report 92 receiver can use the redacted string, such as an obscured email 93 address, to determine that two such unredacted strings were 94 identical; the reports originally contained the same address. 96 Generally, it is assumed that the recipient-identifying fields of a 97 message, when copied into a report, are to be obscured to protect the 98 identity of the end user who submitted the complaint about the 99 message. However, it is also presumed that other data will be left 100 intact, and those data could be correlated against log files or other 101 resources to determine the intended recipient of the original 102 message. 104 2. Keywords 106 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 107 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 108 document are to be interpreted as described in [KEYWORDS]. 110 3. Recommended Practice 112 When redacting of reports is desired, in order to enable a report 113 receiver to correlate reports that might refer to a common but 114 anonymous source, the report generator SHOULD use the following 115 practice: 117 1. Select a transformation mechanism (see Section 4) that is 118 consistent (i.e., the same input string produces the same output 119 each time) and reasonably collision-resistant (i.e., two 120 different inputs are unlikely to produce the same output). 121 2. Identify string(s) (such as local-parts of email addresses) in a 122 message that need to be redacted. Call these strings the 123 "private data". 124 3. For each piece of private data, apply the selected transformation 125 mechanism. 126 4. If the output of the transformation can contain bytes that are 127 not printable ASCII, or if the output can include characters not 128 appropriate to replace the private data directly, encode the 129 output with the base64 algorithm as defined in Section 4 of 130 [BASE64], or some similar translation to a form valid replacement 131 in the original context. For example, replacing a local-part in 132 an email address with transformation output containing an "@" 133 character (ASCII 0x40) or a space character (ASCII 0x20) is not 134 permitted by the specification for local-part ([SMTP]), so the 135 transformation output needs to be encoded as described. 136 5. Replace each instance of private data with the corresponding 137 (possibly encoded) transformation when generating the report. 138 Note that the replaced text could also be in a context that has 139 constraints such as length limits that need to be observed. 141 This has the effect of obscuring the data (in a potentially 142 irreversible way) while still allowing the report recipient to 143 observe that numerous reports are about one particular end user. 144 Such detection enables the receiver to prioritize its reactions based 145 on problems that appear to be focused on specific end users that may 146 be under attack. 148 4. Transformation Mechanisms 150 This memo does not specify a particular transformation mechanism as a 151 requirement. The interoperability that this memo seeks to provide is 152 enabled by the consistency of the transformation. 154 The issue of the security of the transformation, frustrating attempts 155 to reverse the transformation, is a matter of local policy. A 156 continuum of possible transformations exists, from trivial ones such 157 as rot13, CRC32 and base64, through strong cryptographic encodings 158 such as [HMAC] and even full encryption, or private transformations 159 such as mapping an email address to an internal customer number. An 160 operator wishing to perform report redaction needs to select a 161 consistent transformation that obscures the private data and is 162 resilient to attempts to extract the original data to the extent 163 required by local policy, keeping in mind that the environment in 164 which the transformation is operating is not a highly secure one. 165 See Section 5.3 for further details of this issue. 167 An implementation MAY choose any transformation that has a reasonably 168 low likelihood of collision. 170 5. Security Considerations 172 5.1. General 174 General security issues with respect to these reports are found in 175 [ARF]. 177 5.2. Digest Collisions 179 Message digest collisions are a well-understood issue. Their 180 application here involves a report receiver improperly concluding 181 that two pieces of redacted information were originally the same when 182 in fact they are not. This can lead to a denial of service, where 183 the inadvertently improper application of complaint data causes 184 unjustified corrective action. Such cases are sufficiently unlikely 185 as to be of little concern. 187 5.3. Information Not Redacted 189 Although the identity of the user causing a report to be generated 190 can be obscured using this mechanism, other properties of a message 191 (such as the Message-ID field) that are not redacted could be used to 192 recover the original data by locating them in the message logs of the 193 originating system or via other data correlation techniques. It is 194 incumbent on the report generator to anticipate and redact or 195 otherwise obscure such data, or accept that such recovery is possible 196 even from the very simplest kinds of feedback. 198 It is for this reason that the normative portions of this memo do not 199 include stronger assertions about cryptography used in the 200 transformation. Given the ultimate recoverability of the redacted 201 information, the cryptographic strength of the transformation is not 202 a critical security measure. 204 The process of redacting a feedback report satisfies a privacy 205 requirement established by local policy, and is not meant to provide 206 strong security properties. 208 [FBL-BCP] and Section 8 of [ARF] discuss topics related to 209 establishment of bilateral agreements between report producers and 210 consumers. The issues raised here are also things to be considered 211 when establishing such agreements. 213 6. Privacy Considerations 215 While the method of redaction described in this document may reduce 216 the likelihood of some types of private data from leaking between 217 ADMDs, it is extremely unlikely that report generation software could 218 ever be created to recognize all of the different ways that private 219 information could be expressed through human written language. If 220 further protections are required, implementers may wish to consider 221 establishing some sort of out-of-band arrangements between the 222 relevant entities to contain private data as much as possible. 224 7. IANA Considerations 226 This memo includes no request to IANA. 228 [RFC Editor note: This section may be removed prior to publication.] 230 8. References 232 8.1. Normative References 234 [ARF] Shafranovich, Y., Levine, J., and M. Kucherawy, "An 235 Extensible Format for Email Feedback Reports", RFC 5965, 236 August 2010. 238 [BASE64] Josefsson, S., "The Base16, Base32, and Base64 Data 239 Encodings", RFC 4648, October 2006. 241 [KEYWORDS] 242 Bradner, S., "Key words for use in RFCs to Indicate 243 Requirement Levels", BCP 14, RFC 2119, March 1997. 245 8.2. Informative References 247 [FBL-BCP] Falk, J., "Complaint Feedback Loop Operational 248 Recommendations", RFC 6449, November 2011. 250 [HMAC] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed- 251 Hashing for Message Authentication", RFC 2104, 252 February 1997. 254 [SMTP] Klensin, J., "Simple Mail Transfer Protocol", RFC 5321, 255 October 2008. 257 Appendix A. Example 259 Assume the following input message: 261 From: alice@example.com 262 To: bob@example.net 263 Subject: Make money fast! 264 Message-ID: <123456789@mailer.example.com> 265 Date: Thu, 17 Nov 2011 22:19:40 -0500 267 Want to make a lot of money really fast? Check it out! 268 http://www.example.com/scam/0xd0d0cafe 270 On receipt, bob@example.net reports this message as abusive through 271 whatever mechanism his mailbox provider has established. This causes 272 an [ARF] message to be generated. However, example.net wishes to 273 obscure Bob's email address lest it be relayed to the offending 274 agent, which could lead to more trouble for Bob. 276 Thus, example.net plans to redact the local-part of the recipient 277 address in the To: field. Local policy and security requirements 278 suggest the algorithm known as "H" (a hash of a key concatenated with 279 the data to be obscured) using SHA1 is adequeate. It has thus 280 selected a redaction key of "potatoes", and the private data in this 281 case is the string "bob". The concatenation of "potatoesbob" is 282 digested with SHA1 and then base64-encoded to the string 283 "rZ8cqXWGiKHzhz1MsFRGTysHia4=". 285 Therefore, when constructing the ARF message in response to Bob's 286 complaint, the following form of the received message is used in the 287 third part of the ARF report: 289 From: alice@example.com 290 To: rZ8cqXWGiKHzhz1MsFRGTysHia4=@example.net 291 Subject: Make money fast! 292 Message-ID: <123456789@mailer.example.com> 293 Date: Thu, 17 Nov 2011 22:19:40 -0500 295 Want to make a lot of money really fast? Check it out! 296 http://www.example.com/scam/0xd0d0cafe 298 Note, however, that it is possible the redacted information can be 299 recovered by agents at example.com searching their logs for the 300 original envelope associated with the message, by correlating with 301 the Message-ID contents which were not redacted here. It is expected 302 that feedback loops generating such reports involve senders that have 303 been vetted against such information leakage. 305 Appendix B. Acknowledgements 307 Much of the text in this document was initially moved from other MARF 308 working group documents, with contributions from Monica Chew, Tim 309 Draegen, Michael Adkins, and other members of the Messaging Anti- 310 Abuse Working Group. Additional feedback was provided by John 311 Levine, S. Moonesamy, Alessandro Vesely, and Mykyta Yevstifeyev. 313 Authors' Addresses 315 J.D. Falk (editor) 316 Return Path 317 100 Mathilda Place, Suite 100 318 Sunnyvale, CA 94086 319 US 321 Email: ietf@cybernothing.org 322 URI: http://www.returnpath.net/ 324 M. Kucherawy (editor) 325 Cloudmark 326 128 King St., 2nd Floor 327 San Francisco, CA 94107 328 US 330 Email: msk@cloudmark.com