idnits 2.17.1 draft-kucherawy-mta-malformed-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 11, 2011) is 4666 days in the past. Is this intentional? Checking references for intended status: Best Current Practice ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Obsolete informational reference (is this intentional?): RFC 4871 (ref. 'DKIM') (Obsoleted by RFC 6376) -- Obsolete informational reference (is this intentional?): RFC 822 (Obsoleted by RFC 2822) Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Individual submission M. Kucherawy 3 Internet-Draft Cloudmark, Inc. 4 Intended status: BCP July 11, 2011 5 Expires: January 12, 2012 7 Best Current Practices for Handling of Malformed Messages 8 draft-kucherawy-mta-malformed-03 10 Abstract 12 The email ecosystem has long had a very permissive set of common 13 processing rules in place, despite increasingly rigid standards 14 governing its components, ostensibly to improve the user experience. 15 The handling of these come at some cost, and various components are 16 faced with decisions about whether or not to permit non-conforming 17 messages to continue toward their destinations unaltered, adjust them 18 to conform (possibly at the cost of losing some of the original 19 message), or outright rejecting them. 21 This memo includes a collection of the best current practices in a 22 variety of such situations, to be used as implementation guidance. 23 It must be emphasized, however, that the intent of this memo is not 24 to standardize malformations or otherwise encourage their 25 proliferation. The messages that are the subject of this memo are 26 manifestly malformed, and the code and culture that generates them 27 needs to be fixed. Nevertheless, many malformed messages from 28 otherwise legitimate senders are in circulation and will be for some 29 time and, unfortunately, commercial reality shows that we cannot 30 simply reject or discard them. Accordingly, this memo presents 31 recommendations for dealing with them in ways that seem to do the 32 least additional harm until the infrastructure is tightened up to 33 match the standards. 35 Status of This Memo 37 This Internet-Draft is submitted in full conformance with the 38 provisions of BCP 78 and BCP 79. 40 Internet-Drafts are working documents of the Internet Engineering 41 Task Force (IETF). Note that other groups may also distribute 42 working documents as Internet-Drafts. The list of current Internet- 43 Drafts is at http://datatracker.ietf.org/drafts/current/. 45 Internet-Drafts are draft documents valid for a maximum of six months 46 and may be updated, replaced, or obsoleted by other documents at any 47 time. It is inappropriate to use Internet-Drafts as reference 48 material or to cite them other than as "work in progress." 49 This Internet-Draft will expire on January 12, 2012. 51 Copyright Notice 53 Copyright (c) 2011 IETF Trust and the persons identified as the 54 document authors. All rights reserved. 56 This document is subject to BCP 78 and the IETF Trust's Legal 57 Provisions Relating to IETF Documents 58 (http://trustee.ietf.org/license-info) in effect on the date of 59 publication of this document. Please review these documents 60 carefully, as they describe your rights and restrictions with respect 61 to this document. Code Components extracted from this document must 62 include Simplified BSD License text as described in Section 4.e of 63 the Trust Legal Provisions and are provided without warranty as 64 described in the Simplified BSD License. 66 Table of Contents 68 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 69 1.1. The Purpose Of This Work . . . . . . . . . . . . . . . . . 3 70 1.2. Not The Purpose Of This Work . . . . . . . . . . . . . . . 3 71 2. Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 72 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . . 4 73 4. Internal Representations . . . . . . . . . . . . . . . . . . . 4 74 5. Mail Submission Agents . . . . . . . . . . . . . . . . . . . . 4 75 6. Header Anomalies . . . . . . . . . . . . . . . . . . . . . . . 4 76 6.1. Non-Header Lines . . . . . . . . . . . . . . . . . . . . . 5 77 6.2. Header Malformations . . . . . . . . . . . . . . . . . . . 6 78 6.3. Header Field Counts . . . . . . . . . . . . . . . . . . . . 6 79 7. MIME Anomalies . . . . . . . . . . . . . . . . . . . . . . . . 7 80 7.1. Missing MIME-Version Field . . . . . . . . . . . . . . . . 7 81 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8 82 9. Security Considerations . . . . . . . . . . . . . . . . . . . . 8 83 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 84 10.1. Normative References . . . . . . . . . . . . . . . . . . . 8 85 10.2. Informative References . . . . . . . . . . . . . . . . . . 8 86 Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . . 9 87 Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . . 9 89 1. Introduction 91 1.1. The Purpose Of This Work 93 The history of email standards, going back to [RFC822] and beyond, 94 contains a fairly rigid evolution of specifications. But 95 implementations within that culture have also long had an 96 undercurrent known formally as the robustness principle, but also 97 known informally as Postel's Law: "Be conservative in what you do, be 98 liberal in what you accept from others." 100 In general, this served the email ecosystem well by allowing a few 101 errors in implementations without obstructing participation in the 102 game. The proverbial bar was set low. However, as we have evolved 103 into the current era, some of these lenient stances have begun to 104 expose opportunities that can be exploited by malefactors. Various 105 email-based applications rely on strong application of these 106 standards for simple security checks, while the very basic building 107 blocks of that infrastructure, intending to be robust, fail utterly 108 to assert those standards. 110 This memo presents some areas in which the more lenient stances can 111 provide vectors for attack, and then presents the collected wisdom of 112 numerous applications in and around the email ecosystem for dealing 113 with them to mitigate their impact. 115 1.2. Not The Purpose Of This Work 117 It is important to understand that this work is not an effort to 118 endorse or standardize certain common malformations. The code and 119 culture that introduces such messages into the mail stream needs to 120 be repaired, as the security penalty now being paid for this lax 121 processing arguably outweighs the reduction in support costs to end 122 users who are not expected to understand the standards. However, the 123 reality is that this will not be fixed quickly. 125 Given this, it is beneficial to provide implementers with guidance 126 about the safest or most effective way to handle malformed messages 127 when they arrive, taking into consideration the tradeoffs of the 128 choices available especially with respect to how various actors in 129 the email ecosystem respond to such messages in terms of handling, 130 parsing, or rendering to end users. 132 2. Keywords 134 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 135 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 136 document are to be interpreted as described in [KEYWORDS]. 138 3. Background 140 The reader would benefit from reading [EMAIL-ARCH] for some general 141 background about the overall email architecture. Of particular 142 interest is the Internet Message Format, detailed in [MAIL]. 143 Throughout this document, the use of the term "messsage" should be 144 assumed to mean a block of text conforming to the Internet Message 145 Format. 147 4. Internal Representations 149 Any agent handling a message could have one or two (or more) distinct 150 representations of a message it is handling. One is an internal 151 representation, such as a block of storage used for the header and a 152 block for the body. These may be sorted, encoded, decoded, etc. as 153 per the needs of that particular module. The other is the 154 representation that is output to the next agent in the handling 155 chain. This might be identical to the version that is input to the 156 module, or it might have some changes such as added or reordered 157 header fields, body modifications to remove malicious content, etc. 159 In some cases, advice is provided only for internal representations. 160 However, there is often occasion to mandate changes to the output as 161 well. 163 5. Mail Submission Agents 165 Within the email context, the single most influential component that 166 can reduce the presence of malformed items in the email system is the 167 Mail Submission Agent (MSA). This is the component that is 168 essentially the interface between end users that create content and 169 the mail stream. 171 The lax processing described earlier in the document creates a high 172 support and security cost overall. Thus, MSAs MUST evolve to become 173 more strict about enforcement of all relevant email standards, 174 especially [MAIL] and the [MIME] family of documents. 176 Relay Mail Transport Agents (MTAs) SHOULD also be more strict; 177 although preventing the dissemination of malformed messages is 178 desirable, the rejection of such mail already in transit also has a 179 support cost, namely the creation of a [DSN] that many end users 180 might not understand. 182 6. Header Anomalies 184 This section covers common syntactical and semantic anomalies found 185 in headers of messages, and presents preferred mitigations. 187 6.1. Non-Header Lines 189 It has been observed that some messages contain a line of text in the 190 header that is not a valid message header field of any kind. For 191 example: 193 From: user@example.com 194 To: userpal@example.net 195 Subject: This is your reminder 196 about the football game tonight 197 Date: Wed, 20 Oct 2010 20:53:35 -0400 199 Don't forget to meet us for the tailgate party! 201 The cause of this is typically a bug in a message generator of some 202 kind. If the fourth line was intended to be a continuation of the 203 third, it should be indented by whitespace as set out in Section 204 2.2.3 of [MAIL]. 206 This anomaly has varying impacts on processing software, depending on 207 the implementation: 209 1. some agents choose to separate the header of the message from the 210 body only at the first empty line (i.e. a CRLF immediately 211 followed by another CRLF); 213 2. some agents assume this anomaly should be interpreted to mean the 214 body starts at line four, as the end of the header is assumed by 215 encountering something that is not a valid header field or folded 216 portion thereof; 218 3. some agents assume this should be interpreted as an intended 219 header folding as described above; 221 4. some agents reject this outright as line four is neither a valid 222 header field nor a folded continuation of a header field prior to 223 an empty line. 225 This can be exploited if it is known that one message handling agent 226 will take one action while the next agent in the handling chain will 227 take another. For example, a filter trained to detect malicious body 228 anomalies (e.g. references to dangerous web sites) that is fed by a 229 Mail Transfer Agent (MTA) implementing (1) above might not get the 230 opportunity to identify something dangerous in a message if it is 231 unaware of the anomaly and does not itself check for it. 233 Consensus indicates the preferred implementation is to terminate 234 header processing before the first character in line four, as 235 described in (2) above. Thus, a module compliant with this 236 specification MUST terminate header processing upon encountering the 237 first line of text that is not a valid header field. That is, all 238 data after that point in the input MUST NOT be considered part of the 239 header of the message. If that line is not an empty line, an empty 240 line MUST be inserted at that point in the emitted version of the 241 message being processed. 243 It should be noted that a few implementations make choice (4) above 244 since any reputable message generation program will get header 245 folding right, and thus anything so blatant as this malformation is 246 likely an error caused by a malefactor. 248 6.2. Header Malformations 250 There are various malformations that exist. A common one is 251 insertion of whitespace at unusual locations, such as: 253 From: user@example.com 254 To: userpal@example.net 255 Subject: This is your reminder 256 MIME-Version : 1.0 257 Content-Type: text/plain 258 Date: Wed, 20 Oct 2010 20:53:35 -0400 260 Don't forget to meet us for the tailgate party! 262 Note the addition of whitespace in line four after the header field 263 name but before the colon that separates the name from the value. 265 The acceptance grammar of [MAIL] permits that extra whitespace, so it 266 cannot be considered invalid. However, a consensus of 267 implementations prefers to remove that whitespace. There is no 268 perceived change to the semantics of the header field being altered 269 as the whitespace is itself semantically meaningless. Thus, a module 270 compliant with this memo MUST remove all whitespace after the field 271 name but before the colon, and MUST emit that version of that field 272 on output. 274 6.3. Header Field Counts 276 Section 3.6 of [MAIL] prescribes specific header field counts for a 277 valid message. Few agents actually enforce these in the sense that a 278 message whose header contents exceed one or more limits set there are 279 generally allowed to pass; they may add any required fields that are 280 missing, however. 282 Also, few agents that use messages as input, including Mail User 283 Agents (MUAs) that actually display messages to users, verify that 284 the input is valid before proceeding. Two popular open source 285 filtering programs and two popular Mailing List Management (MLM) 286 packages examined at the time this memo was drafted select either the 287 first or last instance of a particular field name, such as From, to 288 decide who sent a message. Absent enforcement of [MAIL], an attacker 289 can craft a message with multiple fields if that attacker knows the 290 filter will make a decision based on one but the user will be shown 291 the other. 293 This situation is exacerbated when a claim of message validity is 294 inferred by something like a valid [DKIM] signature. Such a 295 signature might cover one instance of a constrained field but not 296 another, and a naive consumer of DKIM's output, not realizing which 297 one was covered by a valid signature, presume the wrong one was the 298 "good" one. An MUA, for example could show the first of two From 299 fields as "good" or "safe" while the DKIM signature actually only 300 verified the second. 302 Thus, an agent compliant with this specification MUST enact one of 303 the following: 305 1. reject outright or refuse to process further any input message 306 that does not conform to Section 3.6 of [MAIL]; 308 2. remove or, in the case of an MUA, refuse to render any instances 309 of a header field whose presence exceeds a limit prescribed in 310 Section 3.6 of [MAIL] when generating its output; 312 3. alter the name of any header field whose presence exceeds a limit 313 prescribed in Section 3.6 of [MAIL] when generating its outputso 314 that later agents can produce a consistent result. 316 7. MIME Anomalies 318 [MIME], et seq, define a mechanism of message extensions for 319 providing text in character sets other than ASCII, non-text 320 attachments to messages, multi-part message bodies and similar 321 facilities. 323 Some anomalies with MIME-compliant generation are also common. This 324 section discusses some of those and presents preferred mitigations. 326 7.1. Missing MIME-Version Field 328 Any message that uses [MIME] constructs is required to have a MIME- 329 Version header field. Without them, the Content-Type and associated 330 fields have no semantic meaning. 332 It is often observed that a message has complete MIME structure, yet 333 lacks this header field. 335 As described at the end of Section 6.1, this is not expected from a 336 reputable content generator and is often an indication of mass- 337 produced spam or other undesirable messages. 339 Therefore, an agent compliant with this specification MUST internally 340 enact one or more of the following in the absence of a MIME-Version 341 header field: 343 1. Ignore all other MIME-specific fields, even if they are 344 syntactically valid, thus treating the entire message as a 345 single-part message of type text/plain; 347 2. Remove all other MIME-specific fields, even if they are 348 syntactically valid, both internally and when emitting the output 349 version of the message; 351 3. Rename all other MIME-specific fields, even if they are 352 syntactically valid, both internally and when emitting the output 353 version of the message. 355 8. IANA Considerations 357 This memo contains no actions for IANA. 359 9. Security Considerations 361 The discussions of the anomalies above and their prescribed solutions 362 are themselves security considerations. The practises enumerated in 363 this memo are generally perceived to resolve security considerations 364 that already exist rather than introducing new ones. 366 10. References 368 10.1. Normative References 370 [KEYWORDS] Bradner, S., "Key words for use in RFCs to Indicate 371 Requirement Levels", BCP 14, RFC 2119, March 1997. 373 [MAIL] Resnick, P., "Internet Message Format", RFC 5322, 374 October 2008. 376 10.2. Informative References 378 [DKIM] Allman, E., Callas, J., Delany, M., Libbey, M., Fenton, 379 J., and M. Thomas, "DomainKeys Identified Mail (DKIM) 380 Signatures", RFC 4871, May 2007. 382 [DSN] Moore, K. and G. Vaudreuil, "An Extensible Message 383 Format for Delivery Status Notifications", RFC 3464, 384 January 2003. 386 [EMAIL-ARCH] Crocker, D., "Internet Mail Architecture", RFC 5598, 387 July 2009. 389 [MIME] Freed, N. and N. Borenstein, "Multipurpose Internet 390 Mail Extensions (MIME) Part One: Format of Internet 391 Message Bodies", RFC 2045, November 1996. 393 [RFC822] Crocker, D., "Standard for the Format of Internet Text 394 Messages", RFC 822, August 1982. 396 Appendix A. Examples 398 Examples, if needed, can go here. 400 Appendix B. Acknowledgements 402 The author wishes to acknowledge the following for their review and 403 constructive criticism of this proposal: (names) 405 Author's Address 407 Murray S. Kucherawy 408 Cloudmark, Inc. 409 128 King St., 2nd Floor 410 San Francisco, CA 94107 411 US 413 Phone: +1 415 946 3800 414 EMail: msk@cloudmark.com