Individual submission M. Kucherawy Internet-Draft Cloudmark, Inc. Intended status: BCP July 11, 2011 Expires: January 12, 2012 Best Current Practices for Handling of Malformed Messages draft-kucherawy-mta-malformed-03 Abstract The email ecosystem has long had a very permissive set of common processing rules in place, despite increasingly rigid standards governing its components, ostensibly to improve the user experience. The handling of these come at some cost, and various components are faced with decisions about whether or not to permit non-conforming messages to continue toward their destinations unaltered, adjust them to conform (possibly at the cost of losing some of the original message), or outright rejecting them. This memo includes a collection of the best current practices in a variety of such situations, to be used as implementation guidance. It must be emphasized, however, that the intent of this memo is not to standardize malformations or otherwise encourage their proliferation. The messages that are the subject of this memo are manifestly malformed, and the code and culture that generates them needs to be fixed. Nevertheless, many malformed messages from otherwise legitimate senders are in circulation and will be for some time and, unfortunately, commercial reality shows that we cannot simply reject or discard them. Accordingly, this memo presents recommendations for dealing with them in ways that seem to do the least additional harm until the infrastructure is tightened up to match the standards. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." Kucherawy Expires January 12, 2012 [Page 1] Internet-Draft Mailformed Mail BCP July 2011 This Internet-Draft will expire on January 12, 2012. Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. The Purpose Of This Work . . . . . . . . . . . . . . . . . 3 1.2. Not The Purpose Of This Work . . . . . . . . . . . . . . . 3 2. Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4. Internal Representations . . . . . . . . . . . . . . . . . . . 4 5. Mail Submission Agents . . . . . . . . . . . . . . . . . . . . 4 6. Header Anomalies . . . . . . . . . . . . . . . . . . . . . . . 4 6.1. Non-Header Lines . . . . . . . . . . . . . . . . . . . . . 5 6.2. Header Malformations . . . . . . . . . . . . . . . . . . . 6 6.3. Header Field Counts . . . . . . . . . . . . . . . . . . . . 6 7. MIME Anomalies . . . . . . . . . . . . . . . . . . . . . . . . 7 7.1. Missing MIME-Version Field . . . . . . . . . . . . . . . . 7 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8 9. Security Considerations . . . . . . . . . . . . . . . . . . . . 8 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 10.1. Normative References . . . . . . . . . . . . . . . . . . . 8 10.2. Informative References . . . . . . . . . . . . . . . . . . 8 Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . . 9 Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . . 9 Kucherawy Expires January 12, 2012 [Page 2] Internet-Draft Mailformed Mail BCP July 2011 1. Introduction 1.1. The Purpose Of This Work The history of email standards, going back to [RFC822] and beyond, contains a fairly rigid evolution of specifications. But implementations within that culture have also long had an undercurrent known formally as the robustness principle, but also known informally as Postel's Law: "Be conservative in what you do, be liberal in what you accept from others." In general, this served the email ecosystem well by allowing a few errors in implementations without obstructing participation in the game. The proverbial bar was set low. However, as we have evolved into the current era, some of these lenient stances have begun to expose opportunities that can be exploited by malefactors. Various email-based applications rely on strong application of these standards for simple security checks, while the very basic building blocks of that infrastructure, intending to be robust, fail utterly to assert those standards. This memo presents some areas in which the more lenient stances can provide vectors for attack, and then presents the collected wisdom of numerous applications in and around the email ecosystem for dealing with them to mitigate their impact. 1.2. Not The Purpose Of This Work It is important to understand that this work is not an effort to endorse or standardize certain common malformations. The code and culture that introduces such messages into the mail stream needs to be repaired, as the security penalty now being paid for this lax processing arguably outweighs the reduction in support costs to end users who are not expected to understand the standards. However, the reality is that this will not be fixed quickly. Given this, it is beneficial to provide implementers with guidance about the safest or most effective way to handle malformed messages when they arrive, taking into consideration the tradeoffs of the choices available especially with respect to how various actors in the email ecosystem respond to such messages in terms of handling, parsing, or rendering to end users. 2. Keywords The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [KEYWORDS]. Kucherawy Expires January 12, 2012 [Page 3] Internet-Draft Mailformed Mail BCP July 2011 3. Background The reader would benefit from reading [EMAIL-ARCH] for some general background about the overall email architecture. Of particular interest is the Internet Message Format, detailed in [MAIL]. Throughout this document, the use of the term "messsage" should be assumed to mean a block of text conforming to the Internet Message Format. 4. Internal Representations Any agent handling a message could have one or two (or more) distinct representations of a message it is handling. One is an internal representation, such as a block of storage used for the header and a block for the body. These may be sorted, encoded, decoded, etc. as per the needs of that particular module. The other is the representation that is output to the next agent in the handling chain. This might be identical to the version that is input to the module, or it might have some changes such as added or reordered header fields, body modifications to remove malicious content, etc. In some cases, advice is provided only for internal representations. However, there is often occasion to mandate changes to the output as well. 5. Mail Submission Agents Within the email context, the single most influential component that can reduce the presence of malformed items in the email system is the Mail Submission Agent (MSA). This is the component that is essentially the interface between end users that create content and the mail stream. The lax processing described earlier in the document creates a high support and security cost overall. Thus, MSAs MUST evolve to become more strict about enforcement of all relevant email standards, especially [MAIL] and the [MIME] family of documents. Relay Mail Transport Agents (MTAs) SHOULD also be more strict; although preventing the dissemination of malformed messages is desirable, the rejection of such mail already in transit also has a support cost, namely the creation of a [DSN] that many end users might not understand. 6. Header Anomalies This section covers common syntactical and semantic anomalies found in headers of messages, and presents preferred mitigations. Kucherawy Expires January 12, 2012 [Page 4] Internet-Draft Mailformed Mail BCP July 2011 6.1. Non-Header Lines It has been observed that some messages contain a line of text in the header that is not a valid message header field of any kind. For example: From: user@example.com To: userpal@example.net Subject: This is your reminder about the football game tonight Date: Wed, 20 Oct 2010 20:53:35 -0400 Don't forget to meet us for the tailgate party! The cause of this is typically a bug in a message generator of some kind. If the fourth line was intended to be a continuation of the third, it should be indented by whitespace as set out in Section 2.2.3 of [MAIL]. This anomaly has varying impacts on processing software, depending on the implementation: 1. some agents choose to separate the header of the message from the body only at the first empty line (i.e. a CRLF immediately followed by another CRLF); 2. some agents assume this anomaly should be interpreted to mean the body starts at line four, as the end of the header is assumed by encountering something that is not a valid header field or folded portion thereof; 3. some agents assume this should be interpreted as an intended header folding as described above; 4. some agents reject this outright as line four is neither a valid header field nor a folded continuation of a header field prior to an empty line. This can be exploited if it is known that one message handling agent will take one action while the next agent in the handling chain will take another. For example, a filter trained to detect malicious body anomalies (e.g. references to dangerous web sites) that is fed by a Mail Transfer Agent (MTA) implementing (1) above might not get the opportunity to identify something dangerous in a message if it is unaware of the anomaly and does not itself check for it. Consensus indicates the preferred implementation is to terminate header processing before the first character in line four, as Kucherawy Expires January 12, 2012 [Page 5] Internet-Draft Mailformed Mail BCP July 2011 described in (2) above. Thus, a module compliant with this specification MUST terminate header processing upon encountering the first line of text that is not a valid header field. That is, all data after that point in the input MUST NOT be considered part of the header of the message. If that line is not an empty line, an empty line MUST be inserted at that point in the emitted version of the message being processed. It should be noted that a few implementations make choice (4) above since any reputable message generation program will get header folding right, and thus anything so blatant as this malformation is likely an error caused by a malefactor. 6.2. Header Malformations There are various malformations that exist. A common one is insertion of whitespace at unusual locations, such as: From: user@example.com To: userpal@example.net Subject: This is your reminder MIME-Version : 1.0 Content-Type: text/plain Date: Wed, 20 Oct 2010 20:53:35 -0400 Don't forget to meet us for the tailgate party! Note the addition of whitespace in line four after the header field name but before the colon that separates the name from the value. The acceptance grammar of [MAIL] permits that extra whitespace, so it cannot be considered invalid. However, a consensus of implementations prefers to remove that whitespace. There is no perceived change to the semantics of the header field being altered as the whitespace is itself semantically meaningless. Thus, a module compliant with this memo MUST remove all whitespace after the field name but before the colon, and MUST emit that version of that field on output. 6.3. Header Field Counts Section 3.6 of [MAIL] prescribes specific header field counts for a valid message. Few agents actually enforce these in the sense that a message whose header contents exceed one or more limits set there are generally allowed to pass; they may add any required fields that are missing, however. Also, few agents that use messages as input, including Mail User Kucherawy Expires January 12, 2012 [Page 6] Internet-Draft Mailformed Mail BCP July 2011 Agents (MUAs) that actually display messages to users, verify that the input is valid before proceeding. Two popular open source filtering programs and two popular Mailing List Management (MLM) packages examined at the time this memo was drafted select either the first or last instance of a particular field name, such as From, to decide who sent a message. Absent enforcement of [MAIL], an attacker can craft a message with multiple fields if that attacker knows the filter will make a decision based on one but the user will be shown the other. This situation is exacerbated when a claim of message validity is inferred by something like a valid [DKIM] signature. Such a signature might cover one instance of a constrained field but not another, and a naive consumer of DKIM's output, not realizing which one was covered by a valid signature, presume the wrong one was the "good" one. An MUA, for example could show the first of two From fields as "good" or "safe" while the DKIM signature actually only verified the second. Thus, an agent compliant with this specification MUST enact one of the following: 1. reject outright or refuse to process further any input message that does not conform to Section 3.6 of [MAIL]; 2. remove or, in the case of an MUA, refuse to render any instances of a header field whose presence exceeds a limit prescribed in Section 3.6 of [MAIL] when generating its output; 3. alter the name of any header field whose presence exceeds a limit prescribed in Section 3.6 of [MAIL] when generating its outputso that later agents can produce a consistent result. 7. MIME Anomalies [MIME], et seq, define a mechanism of message extensions for providing text in character sets other than ASCII, non-text attachments to messages, multi-part message bodies and similar facilities. Some anomalies with MIME-compliant generation are also common. This section discusses some of those and presents preferred mitigations. 7.1. Missing MIME-Version Field Any message that uses [MIME] constructs is required to have a MIME- Version header field. Without them, the Content-Type and associated fields have no semantic meaning. Kucherawy Expires January 12, 2012 [Page 7] Internet-Draft Mailformed Mail BCP July 2011 It is often observed that a message has complete MIME structure, yet lacks this header field. As described at the end of Section 6.1, this is not expected from a reputable content generator and is often an indication of mass- produced spam or other undesirable messages. Therefore, an agent compliant with this specification MUST internally enact one or more of the following in the absence of a MIME-Version header field: 1. Ignore all other MIME-specific fields, even if they are syntactically valid, thus treating the entire message as a single-part message of type text/plain; 2. Remove all other MIME-specific fields, even if they are syntactically valid, both internally and when emitting the output version of the message; 3. Rename all other MIME-specific fields, even if they are syntactically valid, both internally and when emitting the output version of the message. 8. IANA Considerations This memo contains no actions for IANA. 9. Security Considerations The discussions of the anomalies above and their prescribed solutions are themselves security considerations. The practises enumerated in this memo are generally perceived to resolve security considerations that already exist rather than introducing new ones. 10. References 10.1. Normative References [KEYWORDS] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [MAIL] Resnick, P., "Internet Message Format", RFC 5322, October 2008. 10.2. Informative References [DKIM] Allman, E., Callas, J., Delany, M., Libbey, M., Fenton, J., and M. Thomas, "DomainKeys Identified Mail (DKIM) Kucherawy Expires January 12, 2012 [Page 8] Internet-Draft Mailformed Mail BCP July 2011 Signatures", RFC 4871, May 2007. [DSN] Moore, K. and G. Vaudreuil, "An Extensible Message Format for Delivery Status Notifications", RFC 3464, January 2003. [EMAIL-ARCH] Crocker, D., "Internet Mail Architecture", RFC 5598, July 2009. [MIME] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, November 1996. [RFC822] Crocker, D., "Standard for the Format of Internet Text Messages", RFC 822, August 1982. Appendix A. Examples Examples, if needed, can go here. Appendix B. Acknowledgements The author wishes to acknowledge the following for their review and constructive criticism of this proposal: (names) Author's Address Murray S. Kucherawy Cloudmark, Inc. 128 King St., 2nd Floor San Francisco, CA 94107 US Phone: +1 415 946 3800 EMail: msk@cloudmark.com Kucherawy Expires January 12, 2012 [Page 9]