Network Working Group Jutta Degener Internet Draft Philip Guenther Intended status: Standards Track Sendmail, Inc. Expires: August 2007 February 2007 Updates: RFC-ietf-sieve-variables-08 Sieve Email Filtering: Body Extension draft-ietf-sieve-body-06.txt Status of this memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Copyright Notice Copyright (C) The IETF Trust (2007). Abstract This document defines a new command for the "Sieve" email filtering language that tests for the occurrence of one or more strings in the body of an email message. Degener & Guenther Standards Track [Page 1] Internet-Draft Sieve Email Filtering: Body Extension February 2007 1. Introduction The proposed "body" test checks for the occurrence of one or more strings in the body of an email message. Such a test was initially discussed for the [SIEVE] base document, but was subsequently removed because it was thought to be too costly to implement. Nevertheless, several server vendors have implemented some form of the "body" test. This document reintroduces the "body" test as an extension, and specifies its syntax and semantics. 2. Conventions used. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [KEYWORDS]. Conventions for notations are as in [SIEVE] section 1.1, including use of the "Usage:" label for the definition of text and tagged arguments syntax. The capability string associated with the extension defined in this document is "body". 3. Test body Usage: "body" [COMPARATOR] [MATCH-TYPE] [BODY-TRANSFORM] The body test matches text in the body of an email message, that is, anything following the first empty line after the header. (The empty line itself, if present, is not considered to be part of the body.) The COMPARATOR and MATCH-TYPE keyword parameters are defined in [SIEVE]. The BODY-TRANSFORM is a keyword parameter discussed in section 4, below. If a message consists of a header only, not followed by an empty line, all "body" tests return false, including that for an empty string. Degener & Guenther Standards Track [Page 2] Internet-Draft Sieve Email Filtering: Body Extension February 2007 If a message consists of a header followed only by an empty line with no body lines following it, the message is considered to have an empty string as a body. 4. Body Transform Prior to matching text in a message body, "transformations" can be applied that filter and decode certain parts of the body. These transformations are selected by a "BODY-TRANSFORM" keyword parameter. Usage: ":raw" / ":content" / ":text" The default transformation is :text. 4.1 Body Transform ":raw" The ":raw" transform is intended to match against the undecoded body of a message. If the specified body-transform is ":raw", the [MIME] structure of the body is irrelevant. The implementation MUST NOT remove any transfer encoding from the message, MUST NOT refuse to filter messages with syntactic errors (unless the environment it is part of rejects them outright), and MUST treat multipart boundaries or the MIME headers of enclosed body parts as part of the text being matched against instead of MIME structures to interpret. Example: require "body"; # This will match a message containing the literal text # "MAKE MONEY FAST" in body parts (ignoring any # content-transfer-encodings) or MIME headers other than # the outermost RFC 2822 header. if body :raw :contains "MAKE MONEY FAST" { discard; } Degener & Guenther Standards Track [Page 3] Internet-Draft Sieve Email Filtering: Body Extension February 2007 4.2 Body Transform ":content" If the body transform is ":content", only MIME parts that have the specified content-types are selected for matching. If an individual content type begins or ends with a '/' (slash) or contains multiple slashes, it matches no content types. Otherwise, if it contains a slash, then it specifies a full / pair, and matches only that specific content type. If it is the empty string, all MIME content types are matched. Otherwise, it specifies a only, and any subtype of that type matches it. The search for MIME parts matching the :content specification is recursive and automatically descends into multipart and message/rfc822 MIME parts. All MIME parts with matching types are searched for the key strings. The test returns true if any combination of searched MIME part and key-list argument match. If the :content specification matches a multipart MIME part, only the prologue and epilogue sections of the part will be searched for the key strings; the contents of nested parts are only searched if their respective types match the :content specification. If the :content specification matches a message/rfc822 MIME part, only the header of the nested message will be searched for the key strings; the contents of the nested message body parts are only searched if its content-type matches the :content specification. (Matches against container types with an empty match string can be useful as tests for the existence of such parts.) Degener & Guenther Standards Track [Page 4] Internet-Draft Sieve Email Filtering: Body Extension February 2007 Example: From: Whomever To: Someone Date: Whenever Subject: whatever Content-Type: multipart/mixed; boundary=outer & This is a multi-part message in MIME format. & --outer Content-Type: multipart/alternative; boundary=inner & This is a nested multi-part message in MIME format. & --inner Content-Type: text/plain; charset="us-ascii" $ Hello $ --inner Content-Type: text/html; charset="us-ascii" % Hello % --inner-- & & This is the end of the inner MIME multipart. & --outer Content-Type: message/rfc822 ! From: Someone Else ! Subject: hello request $ Please say Hello $ --outer-- & & This is the end of the outer MIME multipart. In the above example, the '&', '$', '%', and '!' characters at the start of a line are used to illustrate what portions of the example message are used in tests: - the lines starting with '&' are the ones that are tested when a 'body :content "multipart" :contains "MIME"' test is executed. Degener & Guenther Standards Track [Page 5] Internet-Draft Sieve Email Filtering: Body Extension February 2007 - the lines starting with '$' are the ones that are tested when a 'body :content "text/plain" :contains "Hello"' test is executed. - the lines starting with '%' are the ones that are tested when a 'body :content "text/html" :contains "Hello"' test is executed. - the lines starting with '$' or '%' are the ones that are tested when a 'body :content "text" :contains "Hello"' test is executed. - the lines starting with '!' are the ones that are tested when a 'body :content "message/rfc822" :contains "Hello"' test is executed. Comparisons are performed on octets. Implementations decode the content-transfer-encoding and convert text to [UTF-8] as input to the comparator. MIME parts that cannot be decoded and converted MAY be treated as plain US-ASCII, omitted, or processed according to local conventions. A NUL octet (character zero) SHOULD NOT cause early termination of the content being compared against. Implementations MUST support the "quoted-printable", "base64", "7bit", "8bit", and "binary" content transfer encodings. Implementations MUST be capable of converting to UTF-8 the US-ASCII, ISO-8859-1, and the US-ASCII subset of ISO-8859-* character sets. Search expressions MUST NOT match across MIME part boundaries. MIME headers of the containing part MUST NOT be included in the data. Example: require ["body", "fileinto"]; # Save any message with any text MIME part that contains the # words "missile" or "coordinates" in the "secrets" folder. if body :content "text" :contains ["missile", "coordinates"] { fileinto "secrets"; } # Save any message with an audio/mp3 MIME part in # the "jukebox" folder. if body :content "audio/mp3" :contains "" { fileinto "jukebox"; } Degener & Guenther Standards Track [Page 6] Internet-Draft Sieve Email Filtering: Body Extension February 2007 4.3 Body Transform ":text" The ":text" body transform matches against the results of an implementation's best effort at extracting UTF-8 encoded text from a message. In simple implementations, :text MAY be treated the same as :content "text". Sophisticated implementations MAY strip mark-up from the text prior to matching, and MAY convert media types other than text to text prior to matching. (For example, they may be able to convert proprietary text editor formats to text or apply optical character recognition algorithms to image data.) Example: require ["body", "fileinto"]; # Save messages mentioning the project schedule in the # project/schedule folder. if body :text :contains "project schedule" { fileinto "project/schedule"; } 5. Interaction with Other Sieve Extensions Any extension that extends the grammar for the COMPARATOR or MATCH-TYPE nonterminals will also affect the implementation of "body". Wildcard expressions used with "body" are exempt from the side effects described in [VARIABLES]. That is, they MUST NOT set match variables (${1}, ${2}...) to the input values corresponding to wild card sequences in the matched pattern. However, if the extension is present, variable references in the key strings or content type strings are evaluated as described in the draft. Degener & Guenther Standards Track [Page 7] Internet-Draft Sieve Email Filtering: Body Extension February 2007 6. IANA Considerations The following template specifies the IANA registration of the Sieve extension specified in this document: To: iana@iana.org Subject: Registration of new Sieve extension Capability name: body Description: adds the 'body' test for matching against the the body of the message being processed RFC number: this RFC Contact Address: Jutta Degener This information should be added to the list of sieve extensions given on http://www.iana.org/assignments/sieve-extensions. 7. Security Considerations The system MUST be sized and restricted in such a manner that even malicious use of body matching does not deny service to other users of the host system. Filters relying on string matches in the raw body of an email message may be more general than intended. Text matches are no replacement for a spam, virus, or other security related filtering system. 8. Acknowledgments This document has been revised in part based on comments and discussions that took place on and off the SIEVE mailing list. Thanks to Cyrus Daboo, Ned Freed, Bob Johannessen, Simon Josefsson, Mark E. Mallett, Chris Markle, Alexey Melnikov, Ken Murchison, Greg Shapiro, Tim Showalter, Nigel Swinson, and Dowson Tong for reviews and suggestions. Degener & Guenther Standards Track [Page 8] Internet-Draft Sieve Email Filtering: Body Extension February 2007 9. Authors' Addresses Jutta Degener 5245 College Ave, Suite #127 Oakland, CA 94618 Email: jutta@pobox.com Philip Guenther Sendmail, Inc. 6425 Christie Ave, 4th Floor Emeryville, CA 94608 Email: guenther@sendmail.com 10. Discussion This section will be removed when this document leaves the Internet-Draft stage. This draft is intended as an extension to the Sieve mail filtering language. Sieve extensions are discussed on the MTA Filters mailing list at . Subscription requests can be sent to (send an email message with the word "subscribe" in the body). More information on the mailing list along with a WWW archive of back messages is available at . 10.1 Changes from draft-ietf-sieve-body-05.txt Updated boilerplate to match RFC 4748. Added "Intended-Status: Standards Track" and "Updates: draft-ietf-sieve-variables-08" Change the references from appendices to sections. Update [SIEVE] reference. 10.2 Changes from draft-ietf-sieve-body-04.txt Changed 'reject' to 'discard' in the example. Removed reference to regex draft. Update copyright boilerplate. Degener & Guenther Standards Track [Page 9] Internet-Draft Sieve Email Filtering: Body Extension February 2007 10.3 Changes from draft-ietf-sieve-body-03.txt Update IANA registration to match 3028bis. Added direct boilerplate for [KEYWORDS]. 10.4 Changes from draft-ietf-sieve-body-02.txt Updated charset conversion to match draft-ietf-sieve-3028bis-06.txt. Change "Syntax:" to "Usage:". Updated references. 10.5 Changes from draft-ietf-sieve-body-01.txt Updated charset conversion requirements to match those in draft-ietf-sieve-3028bis-03.txt for headers. 10.6 Changes from draft-ietf-sieve-body-00.txt Updated IPR boilerplate to RFC 3978/3979. Many prose corrections in response to WGLC comments. Of particular note: - made clear that :raw treats MIME boundaries and headers as text to be matched against - corrected description in comment of :raw example - clarified the interpretation of invalid content-types in :content - gave precise description of what gets matched when :content is used with message/rfc822 or any multipart type, as well as a comprehensive example - include an example of :text - tightened wording of interaction with [VARIABLES] - added informative reference to [REGEX] Degener & Guenther Standards Track [Page 10] Internet-Draft Sieve Email Filtering: Body Extension February 2007 10.7 Changes from draft-degener-sieve-body-04.txt Renamed to draft-ietf-sieve-body-00.txt; tweaked the title and abstract. Added Philip Guenther as co-author. Split references into normative and informative. Updated [UTF-8] and [VARIABLES] references. Updated IPR boilerplate. 10.8 Changes from draft-degener-sieve-body-03.txt Made "body" exempt from variable-setting side effects in the presence of the "variables" extension and wild cards. It's too hard to implement. Removed :binary. It's uglier and less useful than it needs to be to bother. Added IANA section. 11. Normative References [KEYWORDS] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 15, RFC 2119, March 1997. [MIME] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, November 1996. [SIEVE] Guenther, P. and T. Showalter, "Sieve: A Mail Filtering Language", draft-ietf-sieve-3028bis-12, February 2007. [UTF-8] Yergeau, F., "UTF-8, a transformation format of ISO 10646", RFC 3629, November 2003. 12. Informative References [VARIABLES] Homme, K. T., "Sieve Extension: Variables", draft-ietf-sieve-variables-08.txt, December 2005. Degener & Guenther Standards Track [Page 11] Internet-Draft Sieve Email Filtering: Body Extension February 2007 Full Copyright Statement Copyright (C) The IETF Trust (2007). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Acknowledgement Funding for the RFC Editor function is currently provided by the IETF Administrative Support Activity (IASA). Degener & Guenther Standards Track [Page 12] ------- End of Forwarded Message