Individual submission                                       M. Kucherawy
Internet-Draft                                           Cloudmark, Inc.
Intended status: BCP                                       July 11, 2011
Expires: January 12, 2012


       Best Current Practices for Handling of Malformed Messages
                    draft-kucherawy-mta-malformed-03

Abstract

   The email ecosystem has long had a very permissive set of common
   processing rules in place, despite increasingly rigid standards
   governing its components, ostensibly to improve the user experience.
   The handling of these come at some cost, and various components are
   faced with decisions about whether or not to permit non-conforming
   messages to continue toward their destinations unaltered, adjust them
   to conform (possibly at the cost of losing some of the original
   message), or outright rejecting them.

   This memo includes a collection of the best current practices in a
   variety of such situations, to be used as implementation guidance.
   It must be emphasized, however, that the intent of this memo is not
   to standardize malformations or otherwise encourage their
   proliferation.  The messages that are the subject of this memo are
   manifestly malformed, and the code and culture that generates them
   needs to be fixed.  Nevertheless, many malformed messages from
   otherwise legitimate senders are in circulation and will be for some
   time and, unfortunately, commercial reality shows that we cannot
   simply reject or discard them.  Accordingly, this memo presents
   recommendations for dealing with them in ways that seem to do the
   least additional harm until the infrastructure is tightened up to
   match the standards.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."


Kucherawy               Expires January 12, 2012                [Page 1]

Internet-Draft             Mailformed Mail BCP                 July 2011


   This Internet-Draft will expire on January 12, 2012.

Copyright Notice

   Copyright (c) 2011 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . 3
     1.1.  The Purpose Of This Work  . . . . . . . . . . . . . . . . . 3
     1.2.  Not The Purpose Of This Work  . . . . . . . . . . . . . . . 3
   2.  Keywords  . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
   3.  Background  . . . . . . . . . . . . . . . . . . . . . . . . . . 4
   4.  Internal Representations  . . . . . . . . . . . . . . . . . . . 4
   5.  Mail Submission Agents  . . . . . . . . . . . . . . . . . . . . 4
   6.  Header Anomalies  . . . . . . . . . . . . . . . . . . . . . . . 4
     6.1.  Non-Header Lines  . . . . . . . . . . . . . . . . . . . . . 5
     6.2.  Header Malformations  . . . . . . . . . . . . . . . . . . . 6
     6.3.  Header Field Counts . . . . . . . . . . . . . . . . . . . . 6
   7.  MIME Anomalies  . . . . . . . . . . . . . . . . . . . . . . . . 7
     7.1.  Missing MIME-Version Field  . . . . . . . . . . . . . . . . 7
   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8
   9.  Security Considerations . . . . . . . . . . . . . . . . . . . . 8
   10. References  . . . . . . . . . . . . . . . . . . . . . . . . . . 8
     10.1. Normative References  . . . . . . . . . . . . . . . . . . . 8
     10.2. Informative References  . . . . . . . . . . . . . . . . . . 8
   Appendix A.  Examples . . . . . . . . . . . . . . . . . . . . . . . 9
   Appendix B.  Acknowledgements . . . . . . . . . . . . . . . . . . . 9


Kucherawy               Expires January 12, 2012                [Page 2]

Internet-Draft             Mailformed Mail BCP                 July 2011


1.  Introduction

1.1.  The Purpose Of This Work

   The history of email standards, going back to [RFC822] and beyond,
   contains a fairly rigid evolution of specifications.  But
   implementations within that culture have also long had an
   undercurrent known formally as the robustness principle, but also
   known informally as Postel's Law: "Be conservative in what you do, be
   liberal in what you accept from others."

   In general, this served the email ecosystem well by allowing a few
   errors in implementations without obstructing participation in the
   game.  The proverbial bar was set low.  However, as we have evolved
   into the current era, some of these lenient stances have begun to
   expose opportunities that can be exploited by malefactors.  Various
   email-based applications rely on strong application of these
   standards for simple security checks, while the very basic building
   blocks of that infrastructure, intending to be robust, fail utterly
   to assert those standards.

   This memo presents some areas in which the more lenient stances can
   provide vectors for attack, and then presents the collected wisdom of
   numerous applications in and around the email ecosystem for dealing
   with them to mitigate their impact.

1.2.  Not The Purpose Of This Work

   It is important to understand that this work is not an effort to
   endorse or standardize certain common malformations.  The code and
   culture that introduces such messages into the mail stream needs to
   be repaired, as the security penalty now being paid for this lax
   processing arguably outweighs the reduction in support costs to end
   users who are not expected to understand the standards.  However, the
   reality is that this will not be fixed quickly.

   Given this, it is beneficial to provide implementers with guidance
   about the safest or most effective way to handle malformed messages
   when they arrive, taking into consideration the tradeoffs of the
   choices available especially with respect to how various actors in
   the email ecosystem respond to such messages in terms of handling,
   parsing, or rendering to end users.

2.  Keywords

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [KEYWORDS].


Kucherawy               Expires January 12, 2012                [Page 3]

Internet-Draft             Mailformed Mail BCP                 July 2011


3.  Background

   The reader would benefit from reading [EMAIL-ARCH] for some general
   background about the overall email architecture.  Of particular
   interest is the Internet Message Format, detailed in [MAIL].
   Throughout this document, the use of the term "messsage" should be
   assumed to mean a block of text conforming to the Internet Message
   Format.

4.  Internal Representations

   Any agent handling a message could have one or two (or more) distinct
   representations of a message it is handling.  One is an internal
   representation, such as a block of storage used for the header and a
   block for the body.  These may be sorted, encoded, decoded, etc. as
   per the needs of that particular module.  The other is the
   representation that is output to the next agent in the handling
   chain.  This might be identical to the version that is input to the
   module, or it might have some changes such as added or reordered
   header fields, body modifications to remove malicious content, etc.

   In some cases, advice is provided only for internal representations.
   However, there is often occasion to mandate changes to the output as
   well.

5.  Mail Submission Agents

   Within the email context, the single most influential component that
   can reduce the presence of malformed items in the email system is the
   Mail Submission Agent (MSA).  This is the component that is
   essentially the interface between end users that create content and
   the mail stream.

   The lax processing described earlier in the document creates a high
   support and security cost overall.  Thus, MSAs MUST evolve to become
   more strict about enforcement of all relevant email standards,
   especially [MAIL] and the [MIME] family of documents.

   Relay Mail Transport Agents (MTAs) SHOULD also be more strict;
   although preventing the dissemination of malformed messages is
   desirable, the rejection of such mail already in transit also has a
   support cost, namely the creation of a [DSN] that many end users
   might not understand.

6.  Header Anomalies

   This section covers common syntactical and semantic anomalies found
   in headers of messages, and presents preferred mitigations.


Kucherawy               Expires January 12, 2012                [Page 4]

Internet-Draft             Mailformed Mail BCP                 July 2011


6.1.  Non-Header Lines

   It has been observed that some messages contain a line of text in the
   header that is not a valid message header field of any kind.  For
   example:

       From: user@example.com
       To: userpal@example.net
       Subject: This is your reminder
       about the football game tonight
       Date: Wed, 20 Oct 2010 20:53:35 -0400

       Don't forget to meet us for the tailgate party!

   The cause of this is typically a bug in a message generator of some
   kind.  If the fourth line was intended to be a continuation of the
   third, it should be indented by whitespace as set out in Section
   2.2.3 of [MAIL].

   This anomaly has varying impacts on processing software, depending on
   the implementation:

   1.  some agents choose to separate the header of the message from the
       body only at the first empty line (i.e. a CRLF immediately
       followed by another CRLF);

   2.  some agents assume this anomaly should be interpreted to mean the
       body starts at line four, as the end of the header is assumed by
       encountering something that is not a valid header field or folded
       portion thereof;

   3.  some agents assume this should be interpreted as an intended
       header folding as described above;

   4.  some agents reject this outright as line four is neither a valid
       header field nor a folded continuation of a header field prior to
       an empty line.

   This can be exploited if it is known that one message handling agent
   will take one action while the next agent in the handling chain will
   take another.  For example, a filter trained to detect malicious body
   anomalies (e.g. references to dangerous web sites) that is fed by a
   Mail Transfer Agent (MTA) implementing (1) above might not get the
   opportunity to identify something dangerous in a message if it is
   unaware of the anomaly and does not itself check for it.

   Consensus indicates the preferred implementation is to terminate
   header processing before the first character in line four, as


Kucherawy               Expires January 12, 2012                [Page 5]

Internet-Draft             Mailformed Mail BCP                 July 2011


   described in (2) above.  Thus, a module compliant with this
   specification MUST terminate header processing upon encountering the
   first line of text that is not a valid header field.  That is, all
   data after that point in the input MUST NOT be considered part of the
   header of the message.  If that line is not an empty line, an empty
   line MUST be inserted at that point in the emitted version of the
   message being processed.

   It should be noted that a few implementations make choice (4) above
   since any reputable message generation program will get header
   folding right, and thus anything so blatant as this malformation is
   likely an error caused by a malefactor.

6.2.  Header Malformations

   There are various malformations that exist.  A common one is
   insertion of whitespace at unusual locations, such as:

       From: user@example.com
       To: userpal@example.net
       Subject: This is your reminder
       MIME-Version : 1.0
       Content-Type: text/plain
       Date: Wed, 20 Oct 2010 20:53:35 -0400

       Don't forget to meet us for the tailgate party!

   Note the addition of whitespace in line four after the header field
   name but before the colon that separates the name from the value.

   The acceptance grammar of [MAIL] permits that extra whitespace, so it
   cannot be considered invalid.  However, a consensus of
   implementations prefers to remove that whitespace.  There is no
   perceived change to the semantics of the header field being altered
   as the whitespace is itself semantically meaningless.  Thus, a module
   compliant with this memo MUST remove all whitespace after the field
   name but before the colon, and MUST emit that version of that field
   on output.

6.3.  Header Field Counts

   Section 3.6 of [MAIL] prescribes specific header field counts for a
   valid message.  Few agents actually enforce these in the sense that a
   message whose header contents exceed one or more limits set there are
   generally allowed to pass; they may add any required fields that are
   missing, however.

   Also, few agents that use messages as input, including Mail User


Kucherawy               Expires January 12, 2012                [Page 6]

Internet-Draft             Mailformed Mail BCP                 July 2011


   Agents (MUAs) that actually display messages to users, verify that
   the input is valid before proceeding.  Two popular open source
   filtering programs and two popular Mailing List Management (MLM)
   packages examined at the time this memo was drafted select either the
   first or last instance of a particular field name, such as From, to
   decide who sent a message.  Absent enforcement of [MAIL], an attacker
   can craft a message with multiple fields if that attacker knows the
   filter will make a decision based on one but the user will be shown
   the other.

   This situation is exacerbated when a claim of message validity is
   inferred by something like a valid [DKIM] signature.  Such a
   signature might cover one instance of a constrained field but not
   another, and a naive consumer of DKIM's output, not realizing which
   one was covered by a valid signature, presume the wrong one was the
   "good" one.  An MUA, for example could show the first of two From
   fields as "good" or "safe" while the DKIM signature actually only
   verified the second.

   Thus, an agent compliant with this specification MUST enact one of
   the following:

   1.  reject outright or refuse to process further any input message
       that does not conform to Section 3.6 of [MAIL];

   2.  remove or, in the case of an MUA, refuse to render any instances
       of a header field whose presence exceeds a limit prescribed in
       Section 3.6 of [MAIL] when generating its output;

   3.  alter the name of any header field whose presence exceeds a limit
       prescribed in Section 3.6 of [MAIL] when generating its outputso
       that later agents can produce a consistent result.

7.  MIME Anomalies

   [MIME], et seq, define a mechanism of message extensions for
   providing text in character sets other than ASCII, non-text
   attachments to messages, multi-part message bodies and similar
   facilities.

   Some anomalies with MIME-compliant generation are also common.  This
   section discusses some of those and presents preferred mitigations.

7.1.  Missing MIME-Version Field

   Any message that uses [MIME] constructs is required to have a MIME-
   Version header field.  Without them, the Content-Type and associated
   fields have no semantic meaning.


Kucherawy               Expires January 12, 2012                [Page 7]

Internet-Draft             Mailformed Mail BCP                 July 2011


   It is often observed that a message has complete MIME structure, yet
   lacks this header field.

   As described at the end of Section 6.1, this is not expected from a
   reputable content generator and is often an indication of mass-
   produced spam or other undesirable messages.

   Therefore, an agent compliant with this specification MUST internally
   enact one or more of the following in the absence of a MIME-Version
   header field:

   1.  Ignore all other MIME-specific fields, even if they are
       syntactically valid, thus treating the entire message as a
       single-part message of type text/plain;

   2.  Remove all other MIME-specific fields, even if they are
       syntactically valid, both internally and when emitting the output
       version of the message;

   3.  Rename all other MIME-specific fields, even if they are
       syntactically valid, both internally and when emitting the output
       version of the message.

8.  IANA Considerations

   This memo contains no actions for IANA.

9.  Security Considerations

   The discussions of the anomalies above and their prescribed solutions
   are themselves security considerations.  The practises enumerated in
   this memo are generally perceived to resolve security considerations
   that already exist rather than introducing new ones.

10.  References

10.1.  Normative References

   [KEYWORDS]    Bradner, S., "Key words for use in RFCs to Indicate
                 Requirement Levels", BCP 14, RFC 2119, March 1997.

   [MAIL]        Resnick, P., "Internet Message Format", RFC 5322,
                 October 2008.

10.2.  Informative References

   [DKIM]        Allman, E., Callas, J., Delany, M., Libbey, M., Fenton,
                 J., and M. Thomas, "DomainKeys Identified Mail (DKIM)


Kucherawy               Expires January 12, 2012                [Page 8]

Internet-Draft             Mailformed Mail BCP                 July 2011


                 Signatures", RFC 4871, May 2007.

   [DSN]         Moore, K. and G. Vaudreuil, "An Extensible Message
                 Format for Delivery Status Notifications", RFC 3464,
                 January 2003.

   [EMAIL-ARCH]  Crocker, D., "Internet Mail Architecture", RFC 5598,
                 July 2009.

   [MIME]        Freed, N. and N. Borenstein, "Multipurpose Internet
                 Mail Extensions (MIME) Part One: Format of Internet
                 Message Bodies", RFC 2045, November 1996.

   [RFC822]      Crocker, D., "Standard for the Format of Internet Text
                 Messages", RFC 822, August 1982.

Appendix A.  Examples

   Examples, if needed, can go here.

Appendix B.  Acknowledgements

   The author wishes to acknowledge the following for their review and
   constructive criticism of this proposal: (names)

Author's Address

   Murray S. Kucherawy
   Cloudmark, Inc.
   128 King St., 2nd Floor
   San Francisco, CA  94107
   US

   Phone: +1 415 946 3800
   EMail: msk@cloudmark.com


Kucherawy               Expires January 12, 2012                [Page 9]