idnits 2.17.1 

draft-ietf-appsawg-malformed-mail-11.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (November 22, 2013) is 3808 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  -- Obsolete informational reference (is this intentional?): RFC 1113 (ref.
     'PEM') (Obsoleted by RFC 1421)

  -- Obsolete informational reference (is this intentional?): RFC 2822
     (Obsoleted by RFC 5322)

  -- Obsolete informational reference (is this intentional?): RFC  733
     (Obsoleted by RFC 822)


     Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	APPSAWG                                                     M. Kucherawy
3	Internet-Draft                                                G. Shapiro
4	Intended status: Informational                                  N. Freed
5	Expires: May 26, 2014                                  November 22, 2013

7	             Advice for Safe Handling of Malformed Messages
8	                  draft-ietf-appsawg-malformed-mail-11

10	Abstract

12	   Although Internet mail formats have been precisely defined since the
13	   1970s, authoring and handling software often show only mild
14	   conformance to the specifications.  The malformed messages that
15	   result are non-standard.  Nonetheless, decades of experience has
16	   shown that handling with some tolerance the malformations that result
17	   is often an acceptable approach, and is better than rejecting the
18	   messages outright as nonconformant.  This document includes a
19	   collection of the best advice available regarding a variety of common
20	   malformed mail situations, to be used as implementation guidance.

22	Status of This Memo

24	   This Internet-Draft is submitted in full conformance with the
25	   provisions of BCP 78 and BCP 79.

27	   Internet-Drafts are working documents of the Internet Engineering
28	   Task Force (IETF).  Note that other groups may also distribute
29	   working documents as Internet-Drafts.  The list of current Internet-
30	   Drafts is at http://datatracker.ietf.org/drafts/current/.

32	   Internet-Drafts are draft documents valid for a maximum of six months
33	   and may be updated, replaced, or obsoleted by other documents at any
34	   time.  It is inappropriate to use Internet-Drafts as reference
35	   material or to cite them other than as "work in progress."

37	   This Internet-Draft will expire on May 26, 2014.

39	Copyright Notice

41	   Copyright (c) 2013 IETF Trust and the persons identified as the
42	   document authors.  All rights reserved.

44	   This document is subject to BCP 78 and the IETF Trust's Legal
45	   Provisions Relating to IETF Documents
46	   (http://trustee.ietf.org/license-info) in effect on the date of
47	   publication of this document.  Please review these documents
48	   carefully, as they describe your rights and restrictions with respect
49	   to this document.  Code Components extracted from this document must
50	   include Simplified BSD License text as described in Section 4.e of
51	   the Trust Legal Provisions and are provided without warranty as
52	   described in the Simplified BSD License.

54	Table of Contents

56	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
57	     1.1.  The Purpose Of This Work . . . . . . . . . . . . . . . . .  3
58	     1.2.  Not The Purpose Of This Work . . . . . . . . . . . . . . .  4
59	     1.3.  General Considerations . . . . . . . . . . . . . . . . . .  4
60	   2.  Document Conventions . . . . . . . . . . . . . . . . . . . . .  5
61	     2.1.  Examples . . . . . . . . . . . . . . . . . . . . . . . . .  5
62	   3.  Background . . . . . . . . . . . . . . . . . . . . . . . . . .  5
63	   4.  Invariant Content  . . . . . . . . . . . . . . . . . . . . . .  5
64	   5.  Mail Submission Agents . . . . . . . . . . . . . . . . . . . .  6
65	   6.  Line Termination . . . . . . . . . . . . . . . . . . . . . . .  7
66	   7.  Header Anomalies . . . . . . . . . . . . . . . . . . . . . . .  7
67	     7.1.  Converting Obsolete and Invalid Syntaxes . . . . . . . . .  7
68	       7.1.1.  Host-Address Syntax  . . . . . . . . . . . . . . . . .  8
69	       7.1.2.  Excessive Angle Brackets . . . . . . . . . . . . . . .  8
70	       7.1.3.  Unbalanced Angle Brackets  . . . . . . . . . . . . . .  8
71	       7.1.4.  Unbalanced Parentheses . . . . . . . . . . . . . . . .  8
72	       7.1.5.  Commas in Address Lists  . . . . . . . . . . . . . . .  9
73	       7.1.6.  Unbalanced Quotes  . . . . . . . . . . . . . . . . . .  9
74	       7.1.7.  Naked Local-Parts  . . . . . . . . . . . . . . . . . . 10
75	     7.2.  Non-Header Lines . . . . . . . . . . . . . . . . . . . . . 10
76	     7.3.  Unusual Spacing  . . . . . . . . . . . . . . . . . . . . . 11
77	     7.4.  Header Malformations . . . . . . . . . . . . . . . . . . . 12
78	     7.5.  Header Field Counts  . . . . . . . . . . . . . . . . . . . 12
79	       7.5.1.  Repeated Header Fields . . . . . . . . . . . . . . . . 14
80	       7.5.2.  Missing Header Fields  . . . . . . . . . . . . . . . . 15
81	       7.5.3.  Return-Path  . . . . . . . . . . . . . . . . . . . . . 16
82	     7.6.  Missing or Incorrect Charset Information . . . . . . . . . 16
83	     7.7.  Eight-Bit Data . . . . . . . . . . . . . . . . . . . . . . 17
84	   8.  MIME Anomalies . . . . . . . . . . . . . . . . . . . . . . . . 18
85	     8.1.  Missing MIME-Version Field . . . . . . . . . . . . . . . . 18
86	     8.2.  Faulty Encodings . . . . . . . . . . . . . . . . . . . . . 18
87	   9.  Body Anomalies . . . . . . . . . . . . . . . . . . . . . . . . 19
88	     9.1.  Oversized Lines  . . . . . . . . . . . . . . . . . . . . . 19
89	   10. Security Considerations  . . . . . . . . . . . . . . . . . . . 19
90	   11. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 19
91	   12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20
92	     12.1. Normative References . . . . . . . . . . . . . . . . . . . 20
93	     12.2. Informative References . . . . . . . . . . . . . . . . . . 20
94	   Appendix A.  RFC Editor Notes  . . . . . . . . . . . . . . . . . . 21
95	   Appendix B.  Acknowledgements  . . . . . . . . . . . . . . . . . . 21

97	1.  Introduction

99	1.1.  The Purpose Of This Work

101	   The history of email standards, going back to [RFC733] and beyond,
102	   contains a fairly rigid evolution of specifications However,
103	   implementations within that culture have also long had an
104	   undercurrent known formally as the robustness principle, also known
105	   informally as Postel's Law: "Be liberal in what you accept, and
106	   conservative in what you send."  [RFC1122]

108	   Jon Postel's directive is often misinterpreted to mean that any
109	   deviance from a specification is acceptable.  Rather, it was intended
110	   only to account for legitimate variations in interpretation within
111	   specifications, as well as basic transit errors, like bit errors.
112	   Taken to its unintended extreme, excessive tolerance would imply that
113	   there are no limits to the liberties that a sender might take, while
114	   presuming a burden on a receiver to guess "correctly" at the meaning
115	   of any such variation.  These matters are further compounded by
116	   receiver software -- the end users' mail readers -- which are also
117	   sometimes flawed, leaving senders to craft messages (sometimes
118	   bending the rules) to overcome those flaws.

120	   In general, this served the email ecosystem well by allowing a few
121	   errors in implementations without obstructing participation in the
122	   game.  The proverbial bar was set low.  However, as we have evolved
123	   into the current era, some of these lenient stances have begun to
124	   expose opportunities that can be exploited by malefactors.  Various
125	   email-based applications rely on strong application of these
126	   standards for simple security checks, while the very basic building
127	   blocks of that infrastructure, intending to be robust, fail utterly
128	   to assert those standards.

130	   The distributed and non-interactive nature of email has often
131	   prompted adjustments to receiving software, to handle these
132	   variations, rather than trying to gain better conformance by senders,
133	   since the receiving operator is primarily driven by complaints from
134	   recipient users and has no authority over the sending side of the
135	   system.  Processing with such flexibility comes at some cost, since
136	   mail software is faced with decisions about whether to permit non-
137	   conforming messages to continue toward their destinations unaltered,
138	   adjust them to conform (possibly at the cost of losing some of the
139	   original message), or outright rejecting them.

141	   This document includes a collection of the best advice available
142	   regarding a variety of common malformed mail situations, to be used
143	   as implementation guidance.  These malformations are typically based
144	   around loose interpretations or implementations of specifications
145	   such as Internet Message Format [MAIL] and Multipurpose Internet Mail
146	   Extensions [MIME].

148	1.2.  Not The Purpose Of This Work

150	   It is important to understand that this work is not an effort to
151	   endorse or standardize certain common malformations.  The code and
152	   culture that introduces such messages into the mail stream needs to
153	   be repaired, as the security penalty now being paid for this lax
154	   processing arguably outweighs the reduction in support costs to end
155	   users who are not expected to understand the standards.  However, the
156	   reality is that this will not be fixed quickly.

158	   Given this, it is beneficial to provide implementers with guidance
159	   about the safest or most effective way to handle malformed messages
160	   when they arrive, taking into consideration the tradeoffs of the
161	   choices available especially with respect to how various actors in
162	   the email ecosystem respond to such messages in terms of handling,
163	   parsing, or rendering to end users.

165	1.3.  General Considerations

167	   Many deviations from message format standards are considered by some
168	   receivers to be strong indications that the message is undesirable,
169	   such as spam or something containing malware.  These receivers
170	   quickly decide that the best handling choice is simply to reject or
171	   discard the message.  This means malformations caused by innocent
172	   misunderstandings or ignorance of proper syntax can cause messages
173	   with no ill intent also to fail to be delivered.

175	   Senders that want to ensure message delivery are best advised to
176	   adhere strictly to the relevant standards (including, but not limited
177	   to, [MAIL], [MIME], and [DKIM]), as well as observe other industry
178	   best practices such as may be published from time to time either by
179	   the IETF or independently.

181	   Receivers that haven't the luxury of strict enforcement of the
182	   standards on inbound messages are usually best served by observing
183	   the following guidelines for handling of malformed messages:

185	   1.  Whenever possible, mitigation of syntactic malformations should
186	       be guided by an assessment of the most likely semantic intent.
187	       For example, it is reasonable to conclude that multiple sets of
188	       angle brackets around an address are simply superflous and can be
189	       dropped.

191	   2.  When the intent is unclear, or when it is clear but also
192	       impractical to change the content to reflect that intent,
193	       mitigation should be limited to cases where not taking any
194	       corrective action would clearly lead to a worse outcome.

196	   3.  Security issues, when present, need to be addressed and may force
197	       mitigation strategies that are otherwise suboptimal.

199	2.  Document Conventions

201	2.1.  Examples

203	   Examples of message content include a number within braces at the end
204	   of each line.  These are line numbers for use in subsequent
205	   discussion, and are not actually part of the message content
206	   presented in the example.

208	   Blank lines are not numbered in the examples.

210	3.  Background

212	   The reader would benefit from reading [EMAIL-ARCH] for some general
213	   background about the overall email architecture.  Of particular
214	   interest is the Internet Message Format, detailed in [MAIL].
215	   Throughout this document, the use of the term "message" should be
216	   assumed to mean a block of text conforming to the Internet Message
217	   Format.

219	4.  Invariant Content

221	   An agent handling a message could use several distinct
222	   representations of the message.  One is an internal representation,
223	   such as separate blocks of storage for the header and body, some
224	   header or body alterations, or tables indexed by header name, set up
225	   to make particular kinds of processing easier.  The other is the
226	   representation passed along to the next agent in the handling chain.
227	   This might be identical to the message input to the module, or it
228	   might have some changes such as added or reordered header fields or
229	   body elisions to remove malicious content.

231	   Message handling is usually most effective when each in a sequence of
232	   handling modules receives the same content for analysis.  A module
233	   that "fixes" or otherwise alters the content passed to later modules
234	   can prevent the later modules from identifing malicious or other
235	   content that exposes the end user to harm.  It is important that all
236	   processing modules can make consistent assertions about the content.
237	   Modules that operate sequentially sometimes add private header fields
238	   to relay information downstream for later filters to use (and
239	   possibly remove), or they may have out-of-band ways of doing so.
240	   However, even the presence of private header fields can impact a
241	   downstream handling agent unaware of its local semantics, so an out-
242	   of-band method is always preferable.

244	   The above is less of a concern when multiple analysis modules are
245	   operated in parallel, independent of one another.

247	   Often, abuse reporting systems can act effectively only when a
248	   complaint or report contains the original message exactly as it was
249	   generated.  Messages that have been altered by handling modules might
250	   render a complaint inactionable as the system receiving the report
251	   may be unable to identify the original message as one of its own.

253	   Some message changes alter syntax without changing semantics.  For
254	   example, Section 7.4 describes a situation where an agent removes
255	   additional header whitespace.  This is a syntax change without a
256	   change in semantics, though some systems (such as DKIM) are sensitive
257	   to such changes.  Message system developers need to be aware of the
258	   downstream impact of making either kind of change.

260	   Where a change to content between modules is unavoidable, adding
261	   trace data (such as prepending a standard Received field) will at
262	   least allow tracing of the handling by modules that actually see
263	   different input.

265	   There will always be local handling exceptions, but these guidelines
266	   should be useful for developing integrated message processing
267	   environments.

269	   In most cases, this document only discusses techniques used on
270	   internal representations.  It is occasionally necessary to make
271	   changes between the input and output versions; such cases will be
272	   called out explicitly.

274	5.  Mail Submission Agents

276	   Within the email context, the single most influential component that
277	   can reduce the presence of malformed items in the email system is the
278	   Mail Handling Service (MHS; see [EMAIL-ARCH]), which includes the
279	   Mail Submission Agent (MSA).  This is the component that is
280	   essentially the interface between end users that create content and
281	   the mail stream.

283	   MHSes need to become more strict about enforcement of all relevant
284	   email standards, especially [MAIL] and the [MIME] family of
285	   documents.

287	   More strict conformance by relaying Mail Transfer Agents (MTAs) will
288	   also be helpful. although preventing the dissemination of malformed
289	   messages is desirable, the rejection of such mail already in transit
290	   also has a support cost, namely the creation of a [DSN] that many end
291	   users might not understand.

293	6.  Line Termination

295	   For interoperable Internet Mail messages, the only valid line
296	   separation sequence during a typical SMTP session is ASCII 0x0D
297	   ("carriage return", or CR) followed by ASCII 0x0A ("line feed", or
298	   LF), commonly referred to as CRLF.  This is not the case for binary
299	   mode SMTP (see [BINARYSMTP]).

301	   Common UNIX user tools, however, typically only use LF for internal
302	   line termination.  This means that a protocol engine that converts
303	   between UNIX and Internet Mail formats has to convert between these
304	   two end-of-line representations before transmitting a message or
305	   after receiving it.

307	   Non-compliant implementations can create messages with a mix of line
308	   terminations, such as LF everywhere except CRLF only at the end of
309	   the message.  According to [SMTP] and [MAIL], this means the entire
310	   message actually exists on a single line.

312	   Within modern Internet Mail it is highly unlikely that an isolated CR
313	   or LF is valid in common ASCII text.  Furthermore, when content
314	   actually does need to contain such an unusual character sequence,
315	   [MIME] provides mechanisms for encoding that content in an SMTP-safe
316	   manner.

318	   Thus, it will typically be safe and helpful to treat an isolated CR
319	   or LF as equivalent to a CRLF when parsing a message.

321	   Note that this advice pertains only to the raw SMTP data, and not to
322	   decoded MIME entities.  As noted above, when MIME encoding mechanisms
323	   are used, the unusual character sequences are not visible in the raw
324	   SMTP stream.

326	7.  Header Anomalies

328	   This section covers common syntactic and semantic anomalies found in
329	   a message header, and presents suggested mitigations.

331	7.1.  Converting Obsolete and Invalid Syntaxes

333	   A message using an obsolete header syntax (see Section 4 of [MAIL])
334	   might confound an agent that is attempting to be robust in its
335	   handling of syntax variations.  A bad actor could exploit such a
336	   weakness in order to get abusive or malicious content through a
337	   filter.  This section presents some examples of such variations.
338	   Messages including them ought be rejected; where this is not
339	   possible, recommended internal interpretations are provided.

341	7.1.1.  Host-Address Syntax

343	   The following obsolete syntax attempts to specify source routing:

345	       To: <@example.net:fran@example.com>

347	   This means "send to fran@example.com via the mail service at
348	   example.net".  It can safely be interpreted as:

350	       To: <fran@example.com>

352	7.1.2.  Excessive Angle Brackets

354	   The following over-use of angle brackets:

356	       To: <<<user2@example.org>>>

358	   can safely be interpreted as:

360	       To: <user2@example.org>

362	7.1.3.  Unbalanced Angle Brackets

364	   The following use of unbalanced angle brackets:

366	       To: <another@example.net

368	   can usually be treated as:

370	       To: <another@example.net>

372	   The following:

374	       To: second@example.org>

376	   can usually be treated as:

378	       To: second@example.org

380	7.1.4.  Unbalanced Parentheses

382	   The following use of unbalanced parentheses:

384	       To: (Testing <fran@example.com>

386	   can safely be interpreted as:

388	       To: (Testing) <fran@example.com>

390	   Likewise, this case:

392	       To: Testing) <sam@example.com>

394	   can safely be interpreted as:

396	       To: "Testing)" <sam@example.com>

398	   In both cases, it is obvious where the active email address in the
399	   string can be found.  The former case retains the active email
400	   address in the string by completing what appears to be intended as a
401	   comment; the intent in the latter case is less obvious, so the
402	   leading string is interpreted as a display name.

404	7.1.5.  Commas in Address Lists

406	   This use of an errant comma:

408	       To: <third@example.net, fourth@example.net>

410	   can usually be interpreted as ending an address, so the above is
411	   usually best interpreted as:

413	       To: third@example.net, fourth@example.net

415	7.1.6.  Unbalanced Quotes

417	   The following use of unbalanced quotation marks:

419	       To: "Joe <joe@example.com>

421	   leaves software with no obvious "good" interpretation.  If it is
422	   essential to extract an address from the above, one possible
423	   interpretation is:

425	       To: "Joe <joe@example.com>"@example.net

427	   where "example.net" is the domain name or host name of the handling
428	   agent making the interpretation.  Another possible interpretation,
429	   much simpler and likely more correct, is simply:

431	       To: "Joe" <joe@example.com>

433	7.1.7.  Naked Local-Parts

435	   [MAIL] defines a local-part as the user portion of an email address,
436	   and the display-name as the "user-friendly" label that accompanies
437	   the address specification.

439	   Some broken submission agents might introduce messages with only a
440	   local-part or only a display-name and no properly formed address.
441	   For example:

443	       To: Joe

445	   A submission agent ought to reject this or, at a minimum, append "@"
446	   followed by its own host name or some other valid name likely to
447	   enable a reply to be delivered to the correct mailbox.  Where this is
448	   not done, an agent receiving such a message will probably be
449	   successful by synthesizing a valid header field for evaluation using
450	   the techniques described in Section 7.5.2.

452	7.2.  Non-Header Lines

454	   Some messages contain a line of text in the header that is not a
455	   valid message header field of any kind.  For example:

457	       From: user@example.com {1}
458	       To: userpal@example.net {2}
459	       Subject: This is your reminder {3}
460	       about the football game tonight {4}
461	       Date: Wed, 20 Oct 2010 20:53:35 -0400 {5}

463	       Don't forget to meet us for the tailgate party! {7}

465	   The cause of this is typically a bug in a message generator of some
466	   kind.  Line {4} was intended to be a continuation of line {3}; it
467	   should have been indented by whitespace as set out in Section 2.2.3
468	   of [MAIL].

470	   This anomaly has varying impacts on processing software, depending on
471	   the implementation:

473	   1.  some agents choose to separate the header of the message from the
474	       body only at the first empty line (that is, a CRLF immediately
475	       followed by another CRLF);

477	   2.  some agents assume this anomaly should be interpreted to mean the
478	       body starts at line {4}, as the end of the header is assumed by
479	       encountering something that is not a valid header field or folded
480	       portion thereof;

482	   3.  some agents assume this should be interpreted as an intended
483	       header folding as described above and thus simply append a single
484	       space character (ASCII 0x20) and the content of line {4} to that
485	       of line {3};

487	   4.  some agents reject this outright as line {4} is neither a valid
488	       header field nor a folded continuation of a header field prior to
489	       an empty line.

491	   This can be exploited if it is known that one message handling agent
492	   will take one action while the next agent in the handling chain will
493	   take another.  Consider, for example, a message filter that searches
494	   message headers for properties indicative of abusive of malicious
495	   content that is attached to a Mail Transfer Agent (MTA) implementing
496	   option 2 above.  An attacker could craft a message that includes this
497	   malformation at a position above the property of interest, knowing
498	   the MTA will not consider that content part of the header, and thus
499	   the MTA will not feed it to the filter, thus avoiding detection.
500	   Meanwhile, the Mail User Agent (MUA) which presents the content to an
501	   end user, implements option 1 or 3, which has some undesirable
502	   effect.

504	   It should be noted that a few implementations choose option 4 above
505	   since any reputable message generation program will get header
506	   folding right, and thus anything so blatant as this malformation is
507	   likely an error caused by a malefactor.

509	   The preferred implementation if option 4 above is not employed is to
510	   apply the following heuristic when this malformation is detected:

512	   1.  Search forward for an empty line.  If one is found, then apply
513	       option 3 above to the anomalous line, and continue.

515	   2.  Search forward for another line that appears to be a new header
516	       field (a name followed by a colon).  If one is found, then apply
517	       option 3 above to the anomalous line, and continue.

519	7.3.  Unusual Spacing

521	   The following message is valid per [MAIL]:

523	       From: user@example.com {1}
524	       To: userpal@example.net {2}
525	       Subject: This is your reminder {3}
526	        {4}
527	        about the football game tonight {5}
528	       Date: Wed, 20 Oct 2010 20:53:35 -0400 {6}
529	       Don't forget to meet us for the tailgate party! {8}

531	   Line {4} contains a single whitespace.  The intended result is that
532	   lines {3}, {4}, and {5} comprise a single continued header field.
533	   However, some agents are aggressive at stripping trailing whitespace,
534	   which will cause line {4} to be treated as an empty line, and thus
535	   the separator line between header and body.  This can affect header-
536	   specific processing algorithms as described in the previous section.

538	   This example was legal in earlier versions of the Internet Mail
539	   format standard, but was rendered obsolete as of [RFC2822] as line
540	   {4} could be interpreted as the separator between the header and
541	   body.

543	   The best handling of this example is for a message parsing engine to
544	   behave as if line {4} was not present in the message and for a
545	   message creation engine to emit the message with line {4} removed.

547	7.4.  Header Malformations

549	   Among the many possible malformations, a common one is insertion of
550	   whitespace at unusual locations, such as:

552	       From: user@example.com {1}
553	       To: userpal@example.net {2}
554	       Subject: This is your reminder {3}
555	       MIME-Version : 1.0 {4}
556	       Content-Type: text/plain {5}
557	       Date: Wed, 20 Oct 2010 20:53:35 -0400 {6}

559	       Don't forget to meet us for the tailgate party! {8}

561	   Note the addition of whitespace in line {4} after the header field
562	   name but before the colon that separates the name from the value.

564	   The obsolete grammar of Section 4 of [MAIL] permits that extra
565	   whitespace, so it cannot be considered invalid.  However, a consensus
566	   of implementations prefers to remove that whitespace.  There is no
567	   perceived change to the semantics of the header field being altered
568	   as the whitespace is itself semantically meaningless.  Therefore, it
569	   is best to remove all whitespace after the field name but before the
570	   colon and to emit the field in this modified form.

572	7.5.  Header Field Counts

574	   Section 3.6 of [MAIL] prescribes specific header field counts for a
575	   valid message.  Few agents actually enforce these in the sense that a
576	   message whose header contents exceed one or more limits set there are
577	   generally allowed to pass; they typically add any required fields
578	   that are missing, however.

580	   Also, few agents that use messages as input, including Mail User
581	   Agents (MUAs) that actually display messages to users, verify that
582	   the input is valid before proceeding.  Some popular open source
583	   filtering programs and some popular Mailing List Management (MLM)
584	   packages select either the first or last instance of a particular
585	   field name, such as From, to decide who sent a message.  Absent
586	   strict enforcement of [MAIL], an attacker can craft a message with
587	   multiple instances of the same field fields if that attacker knows
588	   the filter will make a decision based on one but the user will be
589	   shown the others.

591	   This situation is exacerbated when message validity is assessed, such
592	   as through enhanced authentication methods like DomainKeys Identified
593	   Mail [DKIM].  Such methods might cover one instance of a constrained
594	   field but not another, taking the wrong one as "good" or "safe".  An
595	   MUA, for example could show the first of two From fields to an end
596	   user as "good" or "safe" while an authentication method actually only
597	   verified the second.

599	   In attempting to counter this exposure, one of the following
600	   strategies can be used:

602	   1.  reject outright or refuse to process further any input message
603	       that does not conform to Section 3.6 of [MAIL];

605	   2.  remove or, in the case of an MUA, refuse to render any instances
606	       of a header field whose presence exceeds a limit prescribed in
607	       Section 3.6 of [MAIL] when generating its output;

609	   3.  where a field has a limited instance count, combine additional
610	       instances into a single instance carrying the same inforamtion as
611	       the multiple instances;

613	   4.  where a field can contain multiple distinct values (such as From)
614	       or is free-form text (such as Subject), combine them into a
615	       semantically identical single header field of the same name (see
616	       Section 7.5.1);

618	   5.  alter the name of any header field whose presence exceeds a limit
619	       prescribed in Section 3.6 of [MAIL] when generating its output so
620	       that later agents can produce a consistent result.  Any
621	       alteration likely to cause the field to be ignored by downstream
622	       agents is acceptable.  A common approach is to prefix the field
623	       names with a string such as "BAD-".

625	   Selecting a mitigation action from the above list, or some other
626	   action, must consider the needs of the operator making the decision,
627	   and the nature of its user base.

629	7.5.1.  Repeated Header Fields

631	   There are some occasions where repeated fields are encountered where
632	   only one is expected.  Two examples are presented.  First:

634	       From: reminders@example.com {1}
635	       To: jqpublic@example.com {2}
636	       Subject: Automatic Meeting Reminder {3}
637	       Subject: 4pm Today -- Staff Meeting {4}
638	       Date: Wed, 20 Oct 2010 08:00:00 -0700 {5}

640	       Reminder of the staff meeting today in the small {6}
641	       auditorium.  Come early! {7}

643	   The message above has two Subject fields, which is in violation of
644	   Section 3.6 of [MAIL].  A safe interpretation of this would be to
645	   treat it as though the two Subject field values were concatenated, so
646	   long as they are not identical, such as:

648	       From: reminders@example.com {1}
649	       To: jqpublic@example.com {2}
650	       Subject: Automatic Meeting Reminder {3}
651	         4pm Today -- Staff Meeting {4}
652	       Date: Wed, 20 Oct 2010 08:00:00 -0700 {5}

654	       Reminder of the staff meeting today in the small {6}
655	       auditorium.  Come early! {7}

657	   Second:

659	       From: president@example.com {1}
660	       From: vice-president@example.com {2}
661	       To: jqpublic@example.com {3}
662	       Subject: A note from the E-Team {4}
663	       Date: Wed, 20 Oct 2010 08:00:00 -0700 {5}

665	       This memo is to remind you of the corporate dress {6}
666	       code.  Attached you will find an updated copy of {7}
667	       the policy. {8}
668	       ...

670	   As with the first example, there is a violation in terms of the
671	   number of instances of the From field.  A likely safe interpretation
672	   would be to combine these into a comma-separated address list in a
673	   single From field:

675	       From: president@example.com, {1}
676	             vice-president@example.com {2}
677	       To: jqpublic@example.com {3}
678	       Subject: A note from the E-Team {4}
679	       Date: Wed, 20 Oct 2010 08:00:00 -0700 {5}

681	       This memo is to remind you of the corporate dress {6}
682	       code.  Attached you will find an updated copy of {7}
683	       the policy. {8}
684	       ...

686	7.5.2.  Missing Header Fields

688	   Similar to the previous section, there are messages seen in the wild
689	   that lack certain required header fields.  In particular, [MAIL]
690	   requires that a From and Date field be present in all messages.

692	   When presented with a message lacking these fields, the MTA might
693	   perform one of the following:

695	   1.  Make no changes

697	   2.  Add an instance of the missing field(s) using synthesized content
698	       based on data provided in other parts of the protocol

700	   Option 2 is recommended for handling this case.  Handling agents
701	   should add these for internal handling if they are missing, but
702	   should not add them to the external representation.  The reason for
703	   this advice is that there are some filter modules that would consider
704	   the absence of such fields to be a condition warranting special
705	   treatment (for example, rejection), and thus the effectiveness of
706	   such modules would be stymied by an upstream filter adding them in a
707	   way visible to other components.

709	   The synthesized fields should contain a best guess as to what should
710	   have been there; for From, the SMTP MAIL command's address can be
711	   used (if not null) or a placeholder address followed by an address
712	   literal (for example, unknown@[192.0.2.1]); for Date, a date
713	   extracted from a Received field is a reasonable choice.

715	   One other important case to consider is a missing Message-Id field.
716	   An MTA that encounters a message missing this field should synthesize
717	   a valid one and add it to the external representation, since many
718	   deployed tools use the content of that field as a common unique
719	   message reference, so its absence inhibits correlation of message
720	   processing.  Section 3.6.4 of [MAIL] describes advisable practise for
721	   synthesizing the content of this field when it is absent, and
722	   establishes a requirement that it be globally unique.

724	7.5.3.  Return-Path

726	   A valid message will have exactly one Return-Path header field, as
727	   per Section 4.4 of [SMTP].  Should a message be encountered bearing
728	   more than one, all but the topmost one is to be disregarded, as it is
729	   most likely to have been added nearest to the mailbox that received
730	   that message.

732	7.6.  Missing or Incorrect Charset Information

734	   MIME provides the means to include textual material employing
735	   character sets ("charsets") other than US-ASCII.  Such material is
736	   required to have an identified charset.  Charset identification is
737	   done using a "charset" parameter in the Content-Type header field, a
738	   charset label within the MIME entity itself, or the charset can be
739	   implicitly specified by the Content-Type (see [CHARSET]).

741	   It is unfortunately fairly common for required character set
742	   information to be missing or incorrect in textual MIME entities.  As
743	   such, processing agents should perform basic sanity checks, such as:

745	   o  US-ASCII contains bytes between 1 and 127 inclusive only
746	      (colloquially, "7-bit" data), so material including bytes outside
747	      of that range ("8-bit" data) is necessarily not US-ASCII.  (See
748	      Section 2.3.1 of [MAIL].)

750	   o  [UTF-8] has a very specific syntactic structure that other 8-bit
751	      charsets are unlikely to follow.

753	   o  Null bytes (ASCII 0x00) are not allowed in either 7-bit or 8-bit
754	      data.

756	   o  Not all 7-bit material is US-ASCII.  The presence of the various
757	      escape sequences used for character switching can be used as an
758	      indication of the various charsets based on ISO/IEC 2022, such as
759	      those defined in [ISO-2022-CN], [ISO-2022-JP], and [ISO-2022-KR].

761	   When a character set error is detected, processing agents should:

763	   a.  apply heuristics to determine the most likely character set and,
764	       if successful, proceed using that information; or

766	   b.  refuse to process the malformed MIME entity.

768	   A null byte inside a textual MIME entity can cause typical string
769	   processing functions to mis-identify the end of a string, which can
770	   be exploited to hide malicious content from analysis processes.
771	   Accordingly, null bytes require additional special handling.

773	   A few null bytes in isolation is likely to be the result of poor
774	   message construction practices.  Such nulls should be silently
775	   dropped.

777	   Large numbers of null bytes are usually the result of binary material
778	   that is improperly encoded, improperly labeled, or both.  Such
779	   material is likely to be damaged beyond the hope of recovery, so the
780	   best course of action is to refuse to process it.

782	   Finally, the presence of null bytes may be used as indication of
783	   possible malicious intent.

785	7.7.  Eight-Bit Data

787	   Standards-compliant email messages do not contain any non-ASCII data
788	   without indicating that such content is present by means of published
789	   SMTP extensions.  Absent that, MIME encodings are typically used to
790	   convert non-ASCII data to ASCII in a way that can be reversed by
791	   other handling agents or end users.

793	   The best way to handle non-compliant 8bit material depends on its
794	   location.

796	   Non-compliant 8bit material in MIME entity content should simply be
797	   processed as if the necessary SMTP extensions had been used to
798	   transfer the message.  Note that improperly labeled 8bit material in
799	   textual MIME entities may require treatment as described in
800	   Section 7.6.

802	   Non-compliant 8bit material in message or MIME entity header fields
803	   can be handled as follows:

805	   o  Occurrences in unstructured text fields, comments, and phrases,
806	      can be converted into encoded-words (see [MIME3] if a likely
807	      character set can be determined).  Alternatively, 8bit characters
808	      can be removed or replaced with some other character.

810	   o  Occurrences in header fields whose syntax is unknown may be
811	      handled by dropping the field entirely or by removing/replacing
812	      the 8bit character as described above.

814	   o  Occurrences in addresses are especially problematic.  Agents
815	      supporting [EAI] may, if the 8bit material conforms to 8bit
816	      syntax, elect to treat the message as an EAI message and process
817	      it accordingly.  Otherwise, it is in most cases best to exclude
818	      the address from any sort of processing -- which may mean dropping
819	      it entirely -- since any attempt to fix it definitively is
820	      unlikely to be successful.

822	8.  MIME Anomalies

824	   The five-part set of MIME specifications includes a mechanism of
825	   message extensions for providing text in character sets other than
826	   ASCII, non-text attachments to messages, multi-part message bodies,
827	   and similar facilities.

829	   Some anomalies with MIME-compliant generation are also common.  This
830	   section discusses some of those and presents preferred mitigations.

832	8.1.  Missing MIME-Version Field

834	   Any message that uses [MIME] constructs is required to have a MIME-
835	   Version header field.  Without it, the Content-Type and associated
836	   fields have no semantic meaning.

838	   It is often observed that a message has complete MIME structure, yet
839	   lacks this header field.  It is prudent to disregard this absence and
840	   conduct analysis of the message as if it were present, especially by
841	   agents attempting to identify malicious material.

843	   Further, the absence of MIME-Version might be an indication of
844	   malicious intent, and extra scrutiny of the message may be warranted.
845	   Such omissions are not expected from compliant message generators.

847	8.2.  Faulty Encodings

849	   There have been a few different specifications of base64 in the past.
850	   The implementation defined in [MIME] instructs decoders to discard
851	   characters that are not part of the base64 alphabet.  Other
852	   implementations consider an encoded body containing such characters
853	   to be completely invalid.  Very early specifications of base64 (see
854	   [PEM], for example) allowed email-style comments within base64-
855	   encoded data.

857	   The attack vector here involves constructing a base64 body whose
858	   meaning varies given different possible decodings.  If a security
859	   analysis module wishes to be thorough, it should consider scanning
860	   the possible outputs of the known decoding dialects in an attempt to
861	   anticipate how the MUA will interpret the data.

863	9.  Body Anomalies

865	9.1.  Oversized Lines

867	   A message containing a line of content that exceeds 998 characters
868	   plus the line terminator (1000 total) violates Section 2.1.1 of
869	   [MAIL].  Some handling agents may not look at content in a single
870	   line past the first 998 bytes, providing bad actors an opportunity to
871	   hide malicious content.

873	   There is no specified way to handle such messages, other than to
874	   observe that they are non-compliant and reject them, or rewrite the
875	   oversized line such that the message is compliant.

877	   To ensure long lines do not prevent analysis of potentially malicious
878	   data, handling agents are strongly encouraged to take one of the
879	   following actions:

881	   1.  Break such lines into multiple lines at a position that does not
882	       change the semantics of the text being thus altered.  For
883	       example, breaking an oversized line such that a [URI] then spans
884	       two lines could inhibit the proper identification of that URI.

886	   2.  Rewrite the MIME part (or the entire message if not MIME) that
887	       contains the excessively long line using a content encoding that
888	       breaks the line in the transmission but would still result in the
889	       line being intact on decoding for presentation to the user.  Both
890	       of the encodings declared in [MIME] can accomplish this.

892	10.  Security Considerations

894	   The discussions of the anomalies above and their prescribed solutions
895	   are themselves security considerations.  The practises enumerated in
896	   this document are generally perceived as attempts to resolve security
897	   considerations that already exist rather than introducing new ones.
898	   However, some of the attacks described here may not have appeared in
899	   previous email specifications.

901	11.  IANA Considerations

903	   This document contains no actions for IANA.

905	   [RFC Editor: Please remove this section prior to publication.]

907	12.  References
908	12.1.  Normative References

910	   [EMAIL-ARCH]   Crocker, D., "Internet Mail Architecture", RFC 5598,
911	                  July 2009.

913	   [MAIL]         Resnick, P., "Internet Message Format", RFC 5322,
914	                  October 2008.

916	   [MIME]         Freed, N. and N. Borenstein, "Multipurpose Internet
917	                  Mail Extensions (MIME) Part One: Format of Internet
918	                  Message Bodies", RFC 2045, November 1996.

920	12.2.  Informative References

922	   [BINARYSMTP]   Vaudreuil, G., "SMTP Service Extensions for
923	                  Transmission of Large and Binary MIME Messages",
924	                  RFC 3030, December 2000.

926	   [CHARSET]      Melnikov, A. and J. Reschke, "Update to MIME regarding
927	                  "charset" Parameter Handling in Textual Media Types",
928	                  RFC 6657, July 2012.

930	   [DKIM]         Crocker, D., Ed., Hansen, T., Ed., and M. Kucherawy,
931	                  Ed., "DomainKeys Identified Mail (DKIM) Signatures",
932	                  RFC 6376, September 2011.

934	   [DSN]          Moore, K. and G. Vaudreuil, "An Extensible Message
935	                  Format for Delivery Status Notifications", RFC 3464,
936	                  January 2003.

938	   [EAI]          Yang, A., Steele, S., and N. Freed, "Internationalized
939	                  Email Headers", RFC 6532, February 2012.

941	   [ISO-2022-CN]  Zhu, HF., Hu, DY., Wang, ZG., Kao, TC., Chang, WCH.,
942	                  and M. Crispin, "Chinese Character Encoding for
943	                  Internet Messages", RFC 1922, March 1996.

945	   [ISO-2022-JP]  Murai, J., Crispin, M., and E. van der Poel, "Japanese
946	                  Character Encoding for Internet Messages", RFC 1468,
947	                  June 1993.

949	   [ISO-2022-KR]  Choi, U., Chon, K., and H. Park, "Korean Character
950	                  Encoding for Internet Messages", RFC 1557,
951	                  December 1993.

953	   [MIME3]        Moore, K., "MIME (Multipurpose Internet Mail
954	                  Extensions) Part Three: Message Header Extensions for
955	                  Non-ASCII Text", RFC 2047, November 1996.

957	   [PEM]          Linn, J., "Privacy Enhancement for Internet Electronic
958	                  Mail: Part I -- Message Encipherment and
959	                  Authentication Procedures", RFC 1113, August 1989.

961	   [RFC1122]      Braden, R., Ed., "Requirements for Internet Hosts --
962	                  Communication Layers", RFC 1122, October 1989.

964	   [RFC2822]      Resnick, P., Ed., "Internet Message Format", RFC 2822,
965	                  April 2001.

967	   [RFC733]       Crocker, D., Vittal, J., Pogran, K., and D. Henderson,
968	                  Jr., "Standard for the Format of Internet Text
969	                  Messages", RFC 733, November 1977.

971	   [SMTP]         Klensin, J., "Simple Mail Transfer Protocol",
972	                  RFC 5321, October 2008.

974	   [URI]          Berners-Lee, T., Fielding, R., and L. Masinter,
975	                  "Uniform Resource Identifier (URI): Generic Syntax",
976	                  RFC 3986, January 2005.

978	   [UTF-8]        Yergeau, F., "UTF-8, a transformation format of ISO
979	                  10646", RFC 3629, 2003.

981	Appendix A.  RFC Editor Notes

983	   [RFC Editor Note: This section can be removed before publication.]

985	   I can't seem to figure out how to do this with xml2rfc, but the ISO-
986	   2022 reference above should contain the following URI:
987	   http://www.iso.org/iso/catalogue_detail.htm?csnumber=22747

989	Appendix B.  Acknowledgements

991	   The author wishes to acknowledge the following for their review and
992	   constructive criticism of this proposal: Dave Cridland, Dave Crocker,
993	   Jim Galvin, Tony Hansen, John Levine, Franck Martin, Alexey Melnikov,
994	   and Timo Sirainen

996	Authors' Addresses

998	   Murray S. Kucherawy

1000	   EMail: superuser@gmail.com
1001	   Gregory N. Shapiro

1003	   EMail: gshapiro@proofpoint.com

1005	   N. Freed

1007	   EMail: ned.freed@mrochek.com