Sieve Email Filtering: Detecting Duplicate Deliveries
EnschedeNLstephan@rename-it.nl
General
APPSAWGsieveduplicate deliveriesThis document defines a new test command "duplicate" for the "Sieve" email
filtering language. This test adds the ability to detect duplications. The main
application for this new test is handling duplicate
deliveries commonly caused by mailing list subscriptions or redirected mail
addresses. The detection is normally performed by matching the message ID to an
internal list of message IDs from previously delivered messages. For more
complex applications, the "duplicate" test can also use the content of a
specific header or other parts of the message.This document specifies an extension to the Sieve filtering language defined
by RFC 5228. It adds a test to track whether or not
a text string was seen before by the delivery agent in an earlier execution of
the Sieve script. This can be used to detect and handle duplicate message
deliveries.Duplicate deliveries are a common side-effect of being subscribed to a
mailing list. For example, if a member of the list decides to reply to both the
user and the mailing list itself, the user will often get one copy of the
message directly and another through the mailing list. Also, if someone
cross-posts over several mailing lists to which the user is subscribed, the user
will likely receive a copy from each of those lists. In another scenario, the
user has several redirected mail addresses all pointing to his main mail
account. If one of the user's contacts sends the message to more than one of
those addresses, the user will likely receive more than a single copy. Using the
"duplicate" extension, users have the means to detect and handle such
duplicates, e.g. by discarding them, marking them as "seen", or putting them in
a special folder.Duplicate messages are normally detected using the Message-ID header field,
which is required to be unique for each message. However, the "duplicate" test
is flexible enough to use different criteria for defining what makes a message a
duplicate, for example using the subject line or parts of the message body.
Other applications of this new test command are also possible, as long as the
tracked unique value is a string.The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in .Conventions for notations are as in Section 1.1,
including use of the "Usage:" label for the definition of action and tagged
arguments syntax.In its basic form, the "duplicate" test keeps track of which messages were
seen before by this test during an earlier Sieve execution. Messages are
by default identified by their message ID as contained in the Message-ID header.
The "duplicate" test evaluates to "true" when the message was seen before and it
evaluates to "false" when it was not.As a side-effect, the "duplicate" test adds the message ID to an internal
duplicate tracking list once the Sieve execution finishes successfully.
This way, the same test will evaluate to "true" during the next Sieve execution
in which that message ID is encountered. Note that this side-effect is performed
only when the "duplicate" test is actually evaluated. If the "duplicate" test is
nested in a control structure or it is not the first item of an "allof" or
"anyof" test list, its evaluation depends on the result of preceding tests,
which may produce unexpected results.Implementations MUST only update the internal duplicate tracking list when
the Sieve script execution finishes successfully. If failing script executions
add the message ID to the duplicate tracking list, all "duplicate" tests in the
Sieve script would erroneously yield "true" for the next delivery attempt of the
same message, which can -- depending on the action taken for a duplicate
-- easily lead to discarding the message without further notice.However, deferring the definitive modification of the tracking list to the
end of a successful Sieve script execution is not without problems. It can cause
a race condition when a duplicate message is delivered in parallel before the
tracking list is updated. This way, a duplicate message could be missed by the
"duplicate" test. More complex implementations could use a locking mechanism
to prevent this problem. But, irrespective of what implementation is chosen,
situations in which the "duplicate" test erroneously yields "true" MUST be
prevented.The "duplicate" test MUST only check for duplicates amongst message ID values
encountered in previous executions of the Sieve script; it MUST NOT consider
ID values encountered earlier in the current Sieve script execution as
potential duplicates. This means that all "duplicate" tests in a Sieve script
execution, including those located in scripts included using the "include"
extension, MUST always yield the same result if the
arguments are identical.The Messsage-ID header field is assumed to be globally unique as required in
Section 3.6.4 of RFC 5322. In practice, this
assumption may not aways prove to be true. The "duplicate" tests does not
deal with this situation implicitly, which means that false duplicates may be
detected in this case. However, the user can address such situations by
specifying an alternative means of message identification using the ":header" or
the ":uniqueid" argument, as described in the next section.Duplicate tracking involves determining the unique ID for a given message,
and checking whether that unique ID is in the duplicate tracking list.
The unique ID for a message is determined as follows:
When neither the ":header" argument nor the ":uniqueid" argument is used,
the unique ID is the content of the message's Message-ID header field. When the ":header" argument is used, the unique ID is the content of the
specified header field in the message. The header field name is not part of the
resulting unique ID; it consists only of the field value.When the ":uniqueid" argument is used, the unique ID is the string parameter
that is specified with the argument.
The ":header" and ":uniqueid" arguments are mutually exclusive and specifying
both for a single "duplicate" test command MUST trigger an error.The syntax rules for the header name parameter of the ":header" argument are
specified in Section 2.4.2.2 of RFC 5228.
Note that implementations MUST NOT trigger an error for an invalid header name.
Instead, the "duplicate" test MUST yield "false" unconditionally in this case.
The parameter of the ":uniqueid" argument can be any string.If the tracked unique ID value is extracted directly from a message header
field, i.e., when the ":uniqueid" argument is not used, the following operations
MUST be performed before the actual duplicate verification:
Unfold the header line as described in RFC 5322,
Section 2.2.3. (see also Section 2.4.2.2 of
RFC 5228).If possible, convert the header value to Unicode, encoded as UTF-8
(see Section 2.7.2 of RFC 5228). If conversion is
not possible, the value is left unchanged.Trim leading and trailing whitespace from the header value
(see Section 2.2 of RFC 5228).
Note that these rules also apply to the Message-ID header field used by the
basic "duplicate" test without a ":header" or ":uniqueid" argument.
When the ":uniqueid" argument is used, such normalization concerns are the
responsibility of the user.If the header field specified using the ":header" argument exists multiple
times in the message, extraction of the unique ID MUST use only the first
occurrence. This is true whether or not multiple occurrences are allowed by
RFC 5322, Section 3.6. If the specified header field
is not present in the message, the "duplicate" test MUST yield "false"
unconditionally. In that case the duplicate tracking list is left unmodified by
this test, since no unique ID value is available. The same rules apply with
respect to the Message-ID header field for the basic "duplicate" test without a
":header" or ":uniqueid" argument, since that header field could also be missing
or occur multiple times.The string parameter of the ":uniqueid" argument can be composed from
arbitrary text extracted from the message using the "variables"
extension. To extract text from the message body,
the "foreverypart" and "extracttext" extensions
need to be used as well. This provides the user with detailed control over how
the message's unique ID is created.The unique ID MUST be matched case-sensitively with the contents of the
duplicate tracking list, irrespective of how the unique ID was determined. To
achieve case-insensitive behavior when the ":uniqueid" argument is used, the
"set" command added by the "variables" extension can
be used to normalize the unique ID value to upper or lower case.The "duplicate" test MUST track a unique ID value independent of its source.
This means that it does not matter whether values are obtained from the
message ID header, from an arbitrary header specified using the ":header"
argument or explicitly from the ":uniqueid" argument. The following three
examples are equivalent and match the same entry in the duplicate tracking
list:The ":handle" argument can be used to override this default behavior. The
":handle" argument separates a "duplicate" test from other duplicate tests with
a different or omitted ":handle" argument. Using the ":handle" argument,
unrelated "duplicate" tests can be prevented from interfering with each other:
a message is only recognized as a duplicate when the tracked unique ID was
seen before in an earlier script execution by a "duplicate" test with the
same ":handle" argument.NOTE: The necessary mechanism to track duplicate messages is very similar to
the mechanism that is needed for tracking duplicate responses for the "vacation"
action. One way to implement the necessary mechanism
for the "duplicate" test is therefore to store a hash of the tracked unique ID
and, if provided, the ":handle" argument.Implementations SHOULD let entries in the tracking list expire after a
short period of time. The user can explicitly control the length of this
expiration time by means of the ":seconds" argument, which accepts an integer
value specifying the timeout value in seconds. If the ":seconds" argument is
omitted, an appropriate default value MUST be used. A default expiration time of
around 7 days is usually appropriate. Sites SHOULD impose a maximum limit
on the expiration time. If that limit is exceeded by the ":seconds" argument,
the maximum value MUST silently be substituted; exceeding the limit MUST NOT
produce an error. If the ":seconds" argument is zero, the "duplicate" test MUST
yield "false" unconditionally.When the ":last" argument is omitted, the expiration time for entries in the
duplicate tracking list MUST be measured relative to the moment at which the
entry was first created; i.e., at the end of the successful script execution
during which "duplicate" test returned "false" for a message with that
particular unique ID value. This means that subsequent duplicate messages have
no influence on the time at which the entry in the duplicate tracking list
finally expires.In contrast, when the ":last" argument is specified, the expiration time
MUST be measured relative to the last script execution during which the
"duplicate" test was used to check the entry's unique ID value. This
effectively means that the entry in the duplicate tracking list will not expire
while duplicate messages with the corresponding unique ID keep being delivered
within intervals smaller than the expiration time.It is possible to write Sieve scripts where during a single execution more
than one "duplicate" test is evaluated with the same unique ID value and
":handle" argument but different ":seconds" or ":last" arguments. The resulting
behavior is left undefined by this specification, so such constructs should be
avoided. Implementations MAY choose to use the ":seconds" and ":last" arguments
from the "duplicate" test that was evaluated last.The "duplicate" test does not support either the "index"
, or "mime" extensions
directly, meaning that none of the ":index", ":mime" or associated arguments
are added to the "duplicate" test when these extensions are active. The
":uniqueid" argument can be used in combination with the "variables"
extension to achieve the same result indirectly.Normally, Sieve scripts are executed at final delivery. However, with the
"imapsieve" extension, Sieve scripts are invoked
when the IMAP server performs operations on the message
store, e.g. when messages are uploaded, flagged, or moved to another location.
The "duplicate" test is devised for use at final delivery and the semantics in
the "imapsieve" context are left undefined. Therefore, implementations SHOULD
NOT allow the "duplicate" test to be used in the context of "imapsieve".A Sieve implementation that defines the "duplicate" test command
will advertise the capability string "duplicate".
In this basic example, message duplicates are detected by tracking
the Message-ID header. Duplicate deliveries are stored in a special folder
contained in the user's Trash folder. If the folder does not exist, it is
created automatically using the "mailbox" extension.
This way, the user has a chance to recover messages when necessary. Messages
that are not recognized as duplicates are stored in the user's inbox as normal.
This example shows a more complex use of the "duplicate" test. The user
gets network alerts from a set of remote automated monitoring systems. Several
notifications can be received about the same event from different monitoring
systems. The Message-ID of these messages is different, because these are all
distinct messages from different senders. To avoid being notified more than
a single time about the same event the user writes the following script:The subjects of the notification message are structured with a predictable
pattern which includes a description of the event. In the script above, the
"duplicate" test is used to detect duplicate alert events. The message subject
is matched against a pattern and the event description is extracted using the
"variables" extension. If a message with that
event in the subject was received before, but more than a minute ago, it is not
detected as a duplicate due to the specified ":seconds" argument. In the the
event of a duplicate, the message is marked as "seen" using the "imap4flags"
extension. All alert messages are put into the
"Alerts" mailbox irrespective of whether those messages are duplicates or not.
This example shows how the "duplicate" test can be used to limit the
frequency of notifications sent using the "enotify"
extension. Consider the following scenario: a mail user receives XMPP
notifications about new mail through Sieve, but
sometimes a single contact sends many messages in a short period of time. Now
the user wants to prevent being notified of all of those messages. The user
wants to be notified about messages from each person at most once per 30
minutes and writes the following script:The example shown above uses the message envelope sender rather than the
Message-ID header as the unique ID for duplicate tracking.The example can be extended to allow more messages from the same sender
in close succession as long as the discussed subject is different. This can be
achieved as follows:This uses a combination of the message envelope sender and the subject of
the message as the unique ID for duplicate tracking.For this example, the mail user uses the "duplicate" test for two separate
applications: for discarding duplicate events from a notification system and
to mark certain follow-up messages in a software support mailing as "seen" using
the "imap4flags" extension.The two "duplicate" tests in the following example each use a different
header to identify messages. However, these "X-Event-ID" and "X-Ticket-ID
headers can have similar values in this case (e.g. both based on a time stamp),
meaning that one "duplicate" test can erroneously detect duplicates based on ID
values tracked by the other. Therefore, the user wants to prevent the second
"duplicate" test from matching ID values tracked by the first "duplicate" test
and vice versa. This is achieved by specifying different ":handle" arguments for
these tests.A flood of unique messages could cause the list of tracked message ID values
to grow indefinitely. Therefore, implementations SHOULD limit the number of
entries in the duplicate tracking list. When limiting the number of entries,
implementations SHOULD discard the oldest ones first.The following template specifies the IANA registration of the Sieve
extension specified in this document:This information should be added to the list of sieve extensions
given on http://www.iana.org/assignments/sieve-extensions.Thanks to Cyrus Daboo, Arnt Gulbrandsen, Tony Hansen, Kristin Hubner,
Barry Leiba, Alexey Melnikov, Subramanian Moonesamy, Tom Petch, Hector Santos,
Robert Sparks, and Aaron Stone for reviews and suggestions. With special thanks
to Ned Freed for his guidance and support.Sieve Email Filtering: Date and Index ExtensionsThis document describes the "date" and "index" extensions to the Sieve
email filtering language. The "date" extension gives Sieve the ability to
test date and time values in various ways. The "index" extension provides
a means to limit header and address tests to specific instances of header
fields when header fields are repeated. [STANDARDS-TRACK]Key words for use in RFCs to Indicate Requirement LevelsHarvard University1350 Mass. Ave.CambridgeMA 02138- +1 617 495 3864sob@harvard.edu
General
keywordIn many standards track documents several words are used to signify
the requirements in the specification. These words are often
capitalized. This document defines these words as they should be
interpreted in IETF documents. Authors who follow these guidelines
should incorporate this phrase near the beginning of their document:
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
RFC 2119.
Note that the force of these words is modified by the requirement
level of the document in which they are used.
Internet Message FormatQualcomm Incorporated5775 Morehouse DriveSan DiegoCA92121-1714US+1 858 651 4478presnick@qualcomm.comhttp://www.qualcomm.com/~presnick/This document specifies the Internet Message Format (IMF), a syntax for text messages
that are sent between computer users, within the framework of "electronic mail"
messages. This specification is a revision of Request For Comments (RFC) 2822, which
itself superseded Request For Comments (RFC) 822, "Standard for the Format of ARPA
Internet Text Messages", updating it to reflect current practice and incorporating
incremental changes that were specified in other RFCs.Support for Internet Message Access Protocol (IMAP) Events in SieveSieve defines an email filtering language that can, in
principle, plug into any point in the processing of an email message.
As defined in the base specification, it plugs into mail delivery. This
document defines how Sieve can plug into points in IMAP where messages
are created or changed, adding the option of user-defined or
installation-defined filtering (or, with Sieve extensions, features such
as notifications). Because this requires future Sieve extensions to
specify their interactions with this one, this document updates the base
Sieve specification, RFC 5228. [STANDARDS-TRACK]Sieve Email Filtering: Include ExtensionThe Sieve Email Filtering "include" extension permits users to include
one Sieve script inside another. This can make managing large scripts or
multiple sets of scripts much easier, and allows a site and its users to
build up libraries of scripts. Users are able to include their own
personal scripts or site-wide scripts. [STANDARDS-TRACK]Sieve: An Email Filtering LanguageThis document describes a language for filtering email messages at time of final delivery.
It is designed to be implementable on either a mail client or mail server. It is meant to
be extensible, simple, and independent of access protocol, mail architecture, and operating
system. It is suitable for running on a mail server where users may not be allowed to
execute arbitrary programs, such as on black box Internet Message Access Protocol (IMAP)
servers, as the base language has no variables, loops, or ability to shell out to external
programs. [STANDARDS-TRACK]Sieve Email Filtering: MIME Part Tests, Iteration, Extraction, Replacement, and EnclosureThis document defines extensions to the Sieve email filtering language to
permit analysis and manipulation of the MIME body parts of an email message.
[STANDARDS-TRACK]
Sieve Email Filtering: Variables ExtensionIn advanced mail filtering rule sets, it is useful to keep state or configuration
details across rules. This document updates the Sieve filtering language (RFC 5228)
with an extension to support variables. The extension changes the interpretation of
strings, adds an action to store data in variables, and supplies a new test so that
the value of a string can be examined. [STANDARDS-TRACK]INTERNET MESSAGE ACCESS PROTOCOL - VERSION 4rev1The Internet Message Access Protocol, Version 4rev1 (IMAP4rev1) allows a
client to access and manipulate electronic mail messages on a server.
IMAP4rev1 permits manipulation of mailboxes (remote message folders) in a
way that is functionally equivalent to local folders. IMAP4rev1 also
provides the capability for an offline client to resynchronize with the
server. IMAP4rev1 includes operations for creating, deleting, and renaming
mailboxes, checking for new messages, permanently removing messages, setting
and clearing flags, RFC 2822 and RFC 2045 parsing, searching, and selective
fetching of message attributes, texts, and portions thereof. Messages in
IMAP4rev1 are accessed by the use of numbers. These numbers are either
message sequence numbers or unique identifiers. IMAP4rev1 supports a single
server. A mechanism for accessing configuration information to support
multiple IMAP4rev1 servers is discussed in RFC 2244. IMAP4rev1 does not
specify a means of posting mail; this function is handled by a mail transfer
protocol such as RFC 2821. [STANDARDS-TRACK]
Sieve Email Filtering: Imap4flags ExtensionRecent discussions have shown that it is desirable to set different IMAP (RFC 3501)
flags on message delivery. This can be done, for example, by a Sieve interpreter that
works as a part of a Mail Delivery Agent.</t><t> This document describes an
extension to the Sieve mail filtering language for setting IMAP flags. The extension
allows setting of both IMAP system flags and IMAP keywords. [STANDARDS-TRACK]The Sieve Mail-Filtering Language -- Extensions for Checking Mailbox Status and Accessing Mailbox MetadataThis memo defines an extension to the Sieve mail filtering language (RFC 5228) for accessing mailbox and server
annotations, checking for mailbox existence, and controlling mailbox creation on "fileinto" action. [STANDARDS-TRACK]Sieve Email Filtering: Extension for NotificationsUsers go to great lengths to be notified as quickly as possible that
they have received new mail. Most of these methods involve polling to
check for new messages periodically. A push method handled by the final
delivery agent gives users quicker notifications and saves server
resources. This document does not specify the notification method, but
it is expected that using existing instant messaging infrastructure such
as Extensible Messaging and Presence Protocol (XMPP), or Global System for
Mobile Communications (GSM) Short Message Service (SMS) messages will be
popular. This document describes an extension to the Sieve mail filtering
language that allows users to give specific rules for how and when
notifications should be sent. [STANDARDS-TRACK]Sieve Notification Mechanism: Extensible Messaging and Presence Protocol (XMPP)This document describes a profile of the Sieve extension for
notifications, to allow notifications to be sent over the Extensible
Messaging and Presence Protocol (XMPP), also known as Jabber.
[STANDARDS-TRACK]Sieve Email Filtering: Vacation ExtensionThis document describes an extension to the Sieve email filtering language
for an autoresponder similar to that of the Unix "vacation" command for replying
to messages. Various safety features are included to prevent problems such as
message loops. [STANDARDS-TRACK]