idnits 2.17.1 

draft-kucherawy-mta-malformed-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (July 11, 2011) is 4666 days in the past.  Is this
     intentional?


  Checking references for intended status: Best Current Practice
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Obsolete informational reference (is this intentional?): RFC 4871 (ref.
     'DKIM') (Obsoleted by RFC 6376)

  -- Obsolete informational reference (is this intentional?): RFC  822
     (Obsoleted by RFC 2822)


     Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Individual submission                                       M. Kucherawy
3	Internet-Draft                                           Cloudmark, Inc.
4	Intended status: BCP                                       July 11, 2011
5	Expires: January 12, 2012

7	       Best Current Practices for Handling of Malformed Messages
8	                    draft-kucherawy-mta-malformed-03

10	Abstract

12	   The email ecosystem has long had a very permissive set of common
13	   processing rules in place, despite increasingly rigid standards
14	   governing its components, ostensibly to improve the user experience.
15	   The handling of these come at some cost, and various components are
16	   faced with decisions about whether or not to permit non-conforming
17	   messages to continue toward their destinations unaltered, adjust them
18	   to conform (possibly at the cost of losing some of the original
19	   message), or outright rejecting them.

21	   This memo includes a collection of the best current practices in a
22	   variety of such situations, to be used as implementation guidance.
23	   It must be emphasized, however, that the intent of this memo is not
24	   to standardize malformations or otherwise encourage their
25	   proliferation.  The messages that are the subject of this memo are
26	   manifestly malformed, and the code and culture that generates them
27	   needs to be fixed.  Nevertheless, many malformed messages from
28	   otherwise legitimate senders are in circulation and will be for some
29	   time and, unfortunately, commercial reality shows that we cannot
30	   simply reject or discard them.  Accordingly, this memo presents
31	   recommendations for dealing with them in ways that seem to do the
32	   least additional harm until the infrastructure is tightened up to
33	   match the standards.

35	Status of This Memo

37	   This Internet-Draft is submitted in full conformance with the
38	   provisions of BCP 78 and BCP 79.

40	   Internet-Drafts are working documents of the Internet Engineering
41	   Task Force (IETF).  Note that other groups may also distribute
42	   working documents as Internet-Drafts.  The list of current Internet-
43	   Drafts is at http://datatracker.ietf.org/drafts/current/.

45	   Internet-Drafts are draft documents valid for a maximum of six months
46	   and may be updated, replaced, or obsoleted by other documents at any
47	   time.  It is inappropriate to use Internet-Drafts as reference
48	   material or to cite them other than as "work in progress."
49	   This Internet-Draft will expire on January 12, 2012.

51	Copyright Notice

53	   Copyright (c) 2011 IETF Trust and the persons identified as the
54	   document authors.  All rights reserved.

56	   This document is subject to BCP 78 and the IETF Trust's Legal
57	   Provisions Relating to IETF Documents
58	   (http://trustee.ietf.org/license-info) in effect on the date of
59	   publication of this document.  Please review these documents
60	   carefully, as they describe your rights and restrictions with respect
61	   to this document.  Code Components extracted from this document must
62	   include Simplified BSD License text as described in Section 4.e of
63	   the Trust Legal Provisions and are provided without warranty as
64	   described in the Simplified BSD License.

66	Table of Contents

68	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . 3
69	     1.1.  The Purpose Of This Work  . . . . . . . . . . . . . . . . . 3
70	     1.2.  Not The Purpose Of This Work  . . . . . . . . . . . . . . . 3
71	   2.  Keywords  . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
72	   3.  Background  . . . . . . . . . . . . . . . . . . . . . . . . . . 4
73	   4.  Internal Representations  . . . . . . . . . . . . . . . . . . . 4
74	   5.  Mail Submission Agents  . . . . . . . . . . . . . . . . . . . . 4
75	   6.  Header Anomalies  . . . . . . . . . . . . . . . . . . . . . . . 4
76	     6.1.  Non-Header Lines  . . . . . . . . . . . . . . . . . . . . . 5
77	     6.2.  Header Malformations  . . . . . . . . . . . . . . . . . . . 6
78	     6.3.  Header Field Counts . . . . . . . . . . . . . . . . . . . . 6
79	   7.  MIME Anomalies  . . . . . . . . . . . . . . . . . . . . . . . . 7
80	     7.1.  Missing MIME-Version Field  . . . . . . . . . . . . . . . . 7
81	   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8
82	   9.  Security Considerations . . . . . . . . . . . . . . . . . . . . 8
83	   10. References  . . . . . . . . . . . . . . . . . . . . . . . . . . 8
84	     10.1. Normative References  . . . . . . . . . . . . . . . . . . . 8
85	     10.2. Informative References  . . . . . . . . . . . . . . . . . . 8
86	   Appendix A.  Examples . . . . . . . . . . . . . . . . . . . . . . . 9
87	   Appendix B.  Acknowledgements . . . . . . . . . . . . . . . . . . . 9

89	1.  Introduction

91	1.1.  The Purpose Of This Work

93	   The history of email standards, going back to [RFC822] and beyond,
94	   contains a fairly rigid evolution of specifications.  But
95	   implementations within that culture have also long had an
96	   undercurrent known formally as the robustness principle, but also
97	   known informally as Postel's Law: "Be conservative in what you do, be
98	   liberal in what you accept from others."

100	   In general, this served the email ecosystem well by allowing a few
101	   errors in implementations without obstructing participation in the
102	   game.  The proverbial bar was set low.  However, as we have evolved
103	   into the current era, some of these lenient stances have begun to
104	   expose opportunities that can be exploited by malefactors.  Various
105	   email-based applications rely on strong application of these
106	   standards for simple security checks, while the very basic building
107	   blocks of that infrastructure, intending to be robust, fail utterly
108	   to assert those standards.

110	   This memo presents some areas in which the more lenient stances can
111	   provide vectors for attack, and then presents the collected wisdom of
112	   numerous applications in and around the email ecosystem for dealing
113	   with them to mitigate their impact.

115	1.2.  Not The Purpose Of This Work

117	   It is important to understand that this work is not an effort to
118	   endorse or standardize certain common malformations.  The code and
119	   culture that introduces such messages into the mail stream needs to
120	   be repaired, as the security penalty now being paid for this lax
121	   processing arguably outweighs the reduction in support costs to end
122	   users who are not expected to understand the standards.  However, the
123	   reality is that this will not be fixed quickly.

125	   Given this, it is beneficial to provide implementers with guidance
126	   about the safest or most effective way to handle malformed messages
127	   when they arrive, taking into consideration the tradeoffs of the
128	   choices available especially with respect to how various actors in
129	   the email ecosystem respond to such messages in terms of handling,
130	   parsing, or rendering to end users.

132	2.  Keywords

134	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
135	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
136	   document are to be interpreted as described in [KEYWORDS].

138	3.  Background

140	   The reader would benefit from reading [EMAIL-ARCH] for some general
141	   background about the overall email architecture.  Of particular
142	   interest is the Internet Message Format, detailed in [MAIL].
143	   Throughout this document, the use of the term "messsage" should be
144	   assumed to mean a block of text conforming to the Internet Message
145	   Format.

147	4.  Internal Representations

149	   Any agent handling a message could have one or two (or more) distinct
150	   representations of a message it is handling.  One is an internal
151	   representation, such as a block of storage used for the header and a
152	   block for the body.  These may be sorted, encoded, decoded, etc. as
153	   per the needs of that particular module.  The other is the
154	   representation that is output to the next agent in the handling
155	   chain.  This might be identical to the version that is input to the
156	   module, or it might have some changes such as added or reordered
157	   header fields, body modifications to remove malicious content, etc.

159	   In some cases, advice is provided only for internal representations.
160	   However, there is often occasion to mandate changes to the output as
161	   well.

163	5.  Mail Submission Agents

165	   Within the email context, the single most influential component that
166	   can reduce the presence of malformed items in the email system is the
167	   Mail Submission Agent (MSA).  This is the component that is
168	   essentially the interface between end users that create content and
169	   the mail stream.

171	   The lax processing described earlier in the document creates a high
172	   support and security cost overall.  Thus, MSAs MUST evolve to become
173	   more strict about enforcement of all relevant email standards,
174	   especially [MAIL] and the [MIME] family of documents.

176	   Relay Mail Transport Agents (MTAs) SHOULD also be more strict;
177	   although preventing the dissemination of malformed messages is
178	   desirable, the rejection of such mail already in transit also has a
179	   support cost, namely the creation of a [DSN] that many end users
180	   might not understand.

182	6.  Header Anomalies

184	   This section covers common syntactical and semantic anomalies found
185	   in headers of messages, and presents preferred mitigations.

187	6.1.  Non-Header Lines

189	   It has been observed that some messages contain a line of text in the
190	   header that is not a valid message header field of any kind.  For
191	   example:

193	       From: user@example.com
194	       To: userpal@example.net
195	       Subject: This is your reminder
196	       about the football game tonight
197	       Date: Wed, 20 Oct 2010 20:53:35 -0400

199	       Don't forget to meet us for the tailgate party!

201	   The cause of this is typically a bug in a message generator of some
202	   kind.  If the fourth line was intended to be a continuation of the
203	   third, it should be indented by whitespace as set out in Section
204	   2.2.3 of [MAIL].

206	   This anomaly has varying impacts on processing software, depending on
207	   the implementation:

209	   1.  some agents choose to separate the header of the message from the
210	       body only at the first empty line (i.e. a CRLF immediately
211	       followed by another CRLF);

213	   2.  some agents assume this anomaly should be interpreted to mean the
214	       body starts at line four, as the end of the header is assumed by
215	       encountering something that is not a valid header field or folded
216	       portion thereof;

218	   3.  some agents assume this should be interpreted as an intended
219	       header folding as described above;

221	   4.  some agents reject this outright as line four is neither a valid
222	       header field nor a folded continuation of a header field prior to
223	       an empty line.

225	   This can be exploited if it is known that one message handling agent
226	   will take one action while the next agent in the handling chain will
227	   take another.  For example, a filter trained to detect malicious body
228	   anomalies (e.g. references to dangerous web sites) that is fed by a
229	   Mail Transfer Agent (MTA) implementing (1) above might not get the
230	   opportunity to identify something dangerous in a message if it is
231	   unaware of the anomaly and does not itself check for it.

233	   Consensus indicates the preferred implementation is to terminate
234	   header processing before the first character in line four, as
235	   described in (2) above.  Thus, a module compliant with this
236	   specification MUST terminate header processing upon encountering the
237	   first line of text that is not a valid header field.  That is, all
238	   data after that point in the input MUST NOT be considered part of the
239	   header of the message.  If that line is not an empty line, an empty
240	   line MUST be inserted at that point in the emitted version of the
241	   message being processed.

243	   It should be noted that a few implementations make choice (4) above
244	   since any reputable message generation program will get header
245	   folding right, and thus anything so blatant as this malformation is
246	   likely an error caused by a malefactor.

248	6.2.  Header Malformations

250	   There are various malformations that exist.  A common one is
251	   insertion of whitespace at unusual locations, such as:

253	       From: user@example.com
254	       To: userpal@example.net
255	       Subject: This is your reminder
256	       MIME-Version : 1.0
257	       Content-Type: text/plain
258	       Date: Wed, 20 Oct 2010 20:53:35 -0400

260	       Don't forget to meet us for the tailgate party!

262	   Note the addition of whitespace in line four after the header field
263	   name but before the colon that separates the name from the value.

265	   The acceptance grammar of [MAIL] permits that extra whitespace, so it
266	   cannot be considered invalid.  However, a consensus of
267	   implementations prefers to remove that whitespace.  There is no
268	   perceived change to the semantics of the header field being altered
269	   as the whitespace is itself semantically meaningless.  Thus, a module
270	   compliant with this memo MUST remove all whitespace after the field
271	   name but before the colon, and MUST emit that version of that field
272	   on output.

274	6.3.  Header Field Counts

276	   Section 3.6 of [MAIL] prescribes specific header field counts for a
277	   valid message.  Few agents actually enforce these in the sense that a
278	   message whose header contents exceed one or more limits set there are
279	   generally allowed to pass; they may add any required fields that are
280	   missing, however.

282	   Also, few agents that use messages as input, including Mail User
283	   Agents (MUAs) that actually display messages to users, verify that
284	   the input is valid before proceeding.  Two popular open source
285	   filtering programs and two popular Mailing List Management (MLM)
286	   packages examined at the time this memo was drafted select either the
287	   first or last instance of a particular field name, such as From, to
288	   decide who sent a message.  Absent enforcement of [MAIL], an attacker
289	   can craft a message with multiple fields if that attacker knows the
290	   filter will make a decision based on one but the user will be shown
291	   the other.

293	   This situation is exacerbated when a claim of message validity is
294	   inferred by something like a valid [DKIM] signature.  Such a
295	   signature might cover one instance of a constrained field but not
296	   another, and a naive consumer of DKIM's output, not realizing which
297	   one was covered by a valid signature, presume the wrong one was the
298	   "good" one.  An MUA, for example could show the first of two From
299	   fields as "good" or "safe" while the DKIM signature actually only
300	   verified the second.

302	   Thus, an agent compliant with this specification MUST enact one of
303	   the following:

305	   1.  reject outright or refuse to process further any input message
306	       that does not conform to Section 3.6 of [MAIL];

308	   2.  remove or, in the case of an MUA, refuse to render any instances
309	       of a header field whose presence exceeds a limit prescribed in
310	       Section 3.6 of [MAIL] when generating its output;

312	   3.  alter the name of any header field whose presence exceeds a limit
313	       prescribed in Section 3.6 of [MAIL] when generating its outputso
314	       that later agents can produce a consistent result.

316	7.  MIME Anomalies

318	   [MIME], et seq, define a mechanism of message extensions for
319	   providing text in character sets other than ASCII, non-text
320	   attachments to messages, multi-part message bodies and similar
321	   facilities.

323	   Some anomalies with MIME-compliant generation are also common.  This
324	   section discusses some of those and presents preferred mitigations.

326	7.1.  Missing MIME-Version Field

328	   Any message that uses [MIME] constructs is required to have a MIME-
329	   Version header field.  Without them, the Content-Type and associated
330	   fields have no semantic meaning.

332	   It is often observed that a message has complete MIME structure, yet
333	   lacks this header field.

335	   As described at the end of Section 6.1, this is not expected from a
336	   reputable content generator and is often an indication of mass-
337	   produced spam or other undesirable messages.

339	   Therefore, an agent compliant with this specification MUST internally
340	   enact one or more of the following in the absence of a MIME-Version
341	   header field:

343	   1.  Ignore all other MIME-specific fields, even if they are
344	       syntactically valid, thus treating the entire message as a
345	       single-part message of type text/plain;

347	   2.  Remove all other MIME-specific fields, even if they are
348	       syntactically valid, both internally and when emitting the output
349	       version of the message;

351	   3.  Rename all other MIME-specific fields, even if they are
352	       syntactically valid, both internally and when emitting the output
353	       version of the message.

355	8.  IANA Considerations

357	   This memo contains no actions for IANA.

359	9.  Security Considerations

361	   The discussions of the anomalies above and their prescribed solutions
362	   are themselves security considerations.  The practises enumerated in
363	   this memo are generally perceived to resolve security considerations
364	   that already exist rather than introducing new ones.

366	10.  References

368	10.1.  Normative References

370	   [KEYWORDS]    Bradner, S., "Key words for use in RFCs to Indicate
371	                 Requirement Levels", BCP 14, RFC 2119, March 1997.

373	   [MAIL]        Resnick, P., "Internet Message Format", RFC 5322,
374	                 October 2008.

376	10.2.  Informative References

378	   [DKIM]        Allman, E., Callas, J., Delany, M., Libbey, M., Fenton,
379	                 J., and M. Thomas, "DomainKeys Identified Mail (DKIM)
380	                 Signatures", RFC 4871, May 2007.

382	   [DSN]         Moore, K. and G. Vaudreuil, "An Extensible Message
383	                 Format for Delivery Status Notifications", RFC 3464,
384	                 January 2003.

386	   [EMAIL-ARCH]  Crocker, D., "Internet Mail Architecture", RFC 5598,
387	                 July 2009.

389	   [MIME]        Freed, N. and N. Borenstein, "Multipurpose Internet
390	                 Mail Extensions (MIME) Part One: Format of Internet
391	                 Message Bodies", RFC 2045, November 1996.

393	   [RFC822]      Crocker, D., "Standard for the Format of Internet Text
394	                 Messages", RFC 822, August 1982.

396	Appendix A.  Examples

398	   Examples, if needed, can go here.

400	Appendix B.  Acknowledgements

402	   The author wishes to acknowledge the following for their review and
403	   constructive criticism of this proposal: (names)

405	Author's Address

407	   Murray S. Kucherawy
408	   Cloudmark, Inc.
409	   128 King St., 2nd Floor
410	   San Francisco, CA  94107
411	   US

413	   Phone: +1 415 946 3800
414	   EMail: msk@cloudmark.com