idnits 2.17.1 draft-ietf-appsawg-sieve-duplicate-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 22, 2013) is 3984 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 APPSAWG S. Bosch 3 Internet-Draft May 22, 2013 4 Intended status: Standards Track 5 Expires: November 23, 2013 7 Sieve Email Filtering: Detecting Duplicate Deliveries 8 draft-ietf-appsawg-sieve-duplicate-00 10 Abstract 12 This document defines a new test command "duplicate" for the "Sieve" 13 email filtering language. This test adds the ability to detect 14 duplicate message deliveries. The main application for this new test 15 is handling duplicate deliveries commonly caused by mailing list 16 subscriptions or redirected mail addresses. The detection is 17 normally performed by matching the message ID to an internal list of 18 message IDs from previously delivered messages. For more complex 19 applications, the "duplicate" test can also use the content of a 20 specific header or other parts of the message. 22 Status of this Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at http://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on November 23, 2013. 39 Copyright Notice 41 Copyright (c) 2013 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (http://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 57 2. Conventions Used in This Document . . . . . . . . . . . . . . 3 58 3. Test "duplicate" . . . . . . . . . . . . . . . . . . . . . . . 3 59 4. Sieve Capability Strings . . . . . . . . . . . . . . . . . . . 6 60 5. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 61 5.1. Example 1 . . . . . . . . . . . . . . . . . . . . . . . . 6 62 5.2. Example 2 . . . . . . . . . . . . . . . . . . . . . . . . 7 63 5.3. Example 3 . . . . . . . . . . . . . . . . . . . . . . . . 7 64 6. Security Considerations . . . . . . . . . . . . . . . . . . . 8 65 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 66 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 67 8.1. Normative References . . . . . . . . . . . . . . . . . . . 9 68 8.2. Informative References . . . . . . . . . . . . . . . . . . 10 69 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 10 71 1. Introduction 73 This is an extension to the Sieve filtering language defined by RFC 74 5228 [SIEVE]. It adds a test to determine whether a certain message 75 was seen before by the delivery agent in an earlier execution of the 76 Sieve script. This can be used to detect and handle duplicate 77 message deliveries. 79 Duplicate deliveries are a common side-effect of being subscribed to 80 a mailing list. For example, if a member of the list decides to 81 reply to both the user and the mailing list itself, the user will get 82 one copy of the message directly and another through mailing list. 83 Also, if someone cross-posts over several mailing lists to which the 84 user is subscribed, the user will receive a copy from each of those 85 lists. In another scenario, the user has several redirected mail 86 addresses all pointing to his main mail account. If one of the 87 user's contacts sends the message to more than one of those 88 addresses, the user will likely receive more than a single copy. 89 Using the "duplicate" extension, users have the means to detect and 90 handle such duplicates, e.g. by discarding them, marking them as 91 "seen", or putting them in a special folder. 93 Duplicate messages are normally detected using the Message-ID header 94 field, which is required to be unique for each message. However, the 95 "duplicate" test is flexible enough to use different (weaker) 96 criteria for defining what makes a message a duplicate, for example 97 based on the subject line or parts of the message body. Other 98 applications of this new test command are also possible, as long as 99 the tracked unique value is a string. 101 2. Conventions Used in This Document 103 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 104 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 105 document are to be interpreted as described in [KEYWORDS]. 107 Conventions for notations are as in [SIEVE] Section 1.1, including 108 use of the "Usage:" label for the definition of action and tagged 109 arguments syntax. 111 3. Test "duplicate" 113 Usage: "duplicate" [":handle" ] 114 [":header" / 115 ":uniqueid" ] 116 [":seconds" ] 118 In its basic form, the "duplicate" test keeps track of which messages 119 were seen before by this test during an earlier Sieve execution. 120 Messages are identified by their message ID as contained in the 121 Message-ID header. The "duplicate" test evaluates to "true" when the 122 message was seen before and it evaluates to "false" when it was not. 124 As a side-effect, the "duplicate" test adds the message ID to an 125 internal duplicate tracking list once the Sieve execution finishes 126 successfully. This way, the same test will evaluate to "true" during 127 the next Sieve execution. Implementations MUST prevent making any 128 definitive modifications to the internal duplicate tracking list 129 until the Sieve script execution finishes successfully. If failing 130 script executions would add the message ID to the duplicate tracking 131 list, all "duplicate" tests in the Sieve script would erroneously 132 yield "true" for the next delivery attempt of the same message, which 133 can -- depending on the action taken for a duplicate -- easily lead 134 to discarding the message without further notice. 136 However, deferring the definitive modification of the tracking list 137 to the end of a successful Sieve script execution is not without 138 problems. It can cause a race condition when a duplicate message is 139 delivered in parallel before the tracking list is updated. This way, 140 a duplicate message could be missed by the "duplicate" test. More 141 complex implementations could use a locking mechanism to prevent this 142 problem. But, irrespective of what implementation is chosen, 143 situations in which the "duplicate" test erroneously yields "true" 144 MUST be prevented at all costs. 146 The "duplicate" test MUST only check for duplicates amongst message 147 ID values encountered in previous executions of the Sieve script; it 148 MUST NOT consider ID values encountered earlier in the current Sieve 149 script execution as potential duplicates. This means that all 150 "duplicate" tests in a Sieve script execution, including those 151 located in scripts included using the "include" [INCLUDE] extension, 152 MUST always yield the same result if the arguments are identical. 154 Implementations SHOULD limit the number of entries in the duplicate 155 tracking list. When limiting the number of entries, implementations 156 SHOULD discard the oldest ones first. 158 Also, implementations SHOULD let entries in the tracking list expire 159 after a short period of time. The user can explicitly control the 160 length of this expiration time by means of the ":seconds" argument, 161 which is always specified in seconds. If the ":seconds" argument is 162 omitted, an appropriate default value MUST be used. A default 163 expiration time of around 7 days is deemed to be appropriate. Sites 164 SHOULD impose a maximum limit on the expiration time. If that limit 165 is exceeded, the maximum value MUST silently be substituted; 166 exceeding the limit MUST NOT produce an error. If the ":seconds" 167 argument is zero, the "duplicate" test MUST yield "false" 168 unconditionally. 170 By default, the content of the message's Message-ID header field is 171 used as the unique ID for duplicate tracking. For more complex 172 applications, the "duplicate" test can also be used to detect 173 duplicate deliveries based on other message text. Then, the tracked 174 unique ID can be an arbitrary string value extracted from the 175 message. By adding the ":header" argument with a message header 176 field name, the content of the specified header field can be used as 177 the tracked unique ID instead of the default Message-ID header. 178 Alternatively, the tracked unique ID can be specified explicitly 179 using the ":uniqueid" argument. The ":header" and ":uniqueid" 180 arguments are mutually exclusive and specifying both for a single 181 "duplicate" test command MUST trigger an error. 183 If the tracked unique ID value is extracted directly from a message 184 header field, i.e. when the ":uniqueid" argument is not used, leading 185 and trailing whitespace (see Section 2.2 of RFC 5228 [SIEVE]) MUST 186 first be trimmed from the value before performing the actual 187 duplicate verification. When the ":uniqueid" argument is used, such 188 normalization concerns are the responsibility of the user. 190 If the header field specified using the ":header" argument exists 191 multiple times in the message, only the first occurrence MUST be used 192 for duplicate tracking. If the specified header field is not present 193 in the message, the "duplicate" test MUST yield "false" 194 unconditionally. In that case the duplicate tracking list is left 195 unmodified by this test, since no unique ID value is available. The 196 same rules apply with respect to the Message-ID header field for the 197 basic "duplicate" test without a ":header" or ":uniqueid" argument, 198 since that header field could also be missing or occurring multiple 199 times. 201 The string parameter of the ":uniqueid" argument can be composed from 202 arbitrary text extracted from the message using the "variables" 203 [VARIABLES] extension. To extract text from the message body, the 204 "foreverypart" and "extracttext" [SIEVE-MIME] extensions need to be 205 used as well. This provides the user with detailed control over what 206 identifies a message as a duplicate. 208 Note that the "duplicate" test does not support either the "index" 209 [DATE-INDEX], or "mime" [SIEVE-MIME] extensions directly, meaning 210 that none of the ":index", ":mime:" or associated arguments are added 211 to the "duplicate" test when these extensions are active. The 212 ":uniqueid" argument can be used in combination with the "variables" 213 [VARIABLES] extension to achieve the same result indirectly. 215 The tracked unique ID value MUST be matched case-sensitively, 216 irrespective of whether it originates from a header or is specified 217 explicitly using the ":uniqueid" argument. To achieve case- 218 insensitive behavior, the "set" command added by the "variables" 219 [VARIABLES] extension can be used in combination with the ":uniqueid" 220 argument to normalize the tracked unique ID value to upper or lower 221 case. 223 Using the ":handle" argument, the duplicate test can be employed for 224 multiple independent purposes. The message is recognized as a 225 duplicate only when the tracked unique ID was seen before in an 226 earlier script execution by a "duplicate" test with the same 227 ":handle" argument. 229 NOTE: The necessary mechanism to track duplicate messages is very 230 similar to the mechanism that is needed for tracking duplicate 231 responses for the "vacation" [VACATION] action. One way to implement 232 the necessary mechanism for the "duplicate" test is therefore to 233 store a hash of the tracked unique ID and, if provided, the ":handle" 234 argument. 236 4. Sieve Capability Strings 238 A Sieve implementation that defines the "duplicate" test command will 239 advertise the capability string "duplicate". 241 5. Examples 243 5.1. Example 1 245 In this basic example message duplicates are detected by tracking the 246 Message-ID header. Duplicate deliveries are stored in a special 247 folder contained in the user's Trash folder. If the folder does not 248 exist, it is created automatically using the "mailbox" [MAILBOX] 249 extension. This way, the user has a chance to recover messages when 250 necessary. Messages that are not recognized as duplicates are stored 251 in the user's inbox as normal. 253 require ["duplicate", "fileinto", "mailbox"]; 255 if duplicate { 256 fileinto :create "Trash/Duplicate"; 257 } 259 5.2. Example 2 261 This example shows a more complex use of the "duplicate" test. The 262 user gets network alerts from a set of remote automated monitoring 263 systems. Multiple notifications can be received about the same event 264 from different monitoring systems. The Message-ID of these messages 265 is different, because these are all distinct messages from different 266 senders. To avoid being notified multiple times about the same event 267 the user writes the following script: 269 require ["duplicate", "variables", "imap4flags", 270 "fileinto"]; 272 if header :matches "subject" "ALERT: *" { 273 if duplicate :seconds 60 :uniqueid "${1}" { 274 setflag "\\seen"; 275 } 276 fileinto "Alerts"; 277 } 279 The subjects of the notification message are structured with a 280 predictable pattern which includes a description of the event. In 281 the script above the "duplicate" test is used to detect duplicate 282 alert events. The message subject is matched against a pattern and 283 the event description is extracted using the "variables" [VARIABLES] 284 extension. If a message with that event in the subject was received 285 before, but more than a minute ago, it is not detected as a duplicate 286 due to the specified ":seconds" argument. In the the event of a 287 duplicate, the message is marked as "seen" using the "imap4flags" 288 [IMAP4FLAGS] extension. All alert messages are put into the "Alerts" 289 mailbox irrespective of whether those messages are duplicates or not. 291 5.3. Example 3 293 This example shows how the "duplicate" test can be used to limit the 294 frequency of notifications sent using the "enotify" [NOTIFY] 295 extension. Consider the following scenario: a mail user receives 296 XMPP notifications [NOTIFY-XMPP] about new mail through Sieve, but 297 sometimes a single contact sends many messages in a short period of 298 time. Now the user wants to prevent being notified of all of those 299 messages. The user wants to be notified about messages from each 300 person at most once per 30 minutes and writes the following script: 302 require ["variables", "envelope", "enotify", "duplicate"]; 304 if envelope :matches "from" "*" { set "sender" "${1}"; } 305 if header :matches "subject" "*" { set "subject" "${1}"; } 307 if not duplicate :seconds 1800 :uniqueid "${sender}") 308 { 309 notify :message "[SIEVE] ${sender}: ${subject}" 310 "xmpp:user@im.example.com"; 311 } 313 The example shown above uses the message envelope sender rather than 314 the Message-ID header as the unique ID for duplicate tracking. 316 The example can be extended to allow multiple messages from the same 317 sender in close succession as long as the discussed subject is 318 different. This can be achieved as follows: 320 require ["variables", "envelope", "enotify", "duplicate"]; 322 if envelope :matches "from" "*" { set "sender" "${1}"; } 323 if header :matches "subject" "*" { set "subject" "${1}"; } 325 # account for 'Re:' prefix 326 if string :comparator "i;ascii-casemap" 327 :matches "${subject}" "Re:*" 328 { 329 set "subject" "${1}"; 330 } 331 if not duplicate :seconds 1800 332 :uniqueid "${sender} ${subject}") 333 { 334 notify :message "[SIEVE] ${sender}: ${subject}" 335 "xmpp:user@im.example.com"; 336 } 338 This uses a combination of the message envelope sender and the 339 subject of the message as the unique ID for duplicate tracking. 341 6. Security Considerations 343 A flood of unique messages could cause the list of tracked message ID 344 values to grow indefinitely. Implementations therefore SHOULD 345 implement limits on the number and lifespan of entries in that list. 347 7. IANA Considerations 349 The following template specifies the IANA registration of the Sieve 350 extension specified in this document: 352 To: iana@iana.org 353 Subject: Registration of new Sieve extension 355 Capability name: duplicate 356 Description: Adds test 'duplicate' that can be used to test 357 whether a particular message is a duplicate, 358 i.e. whether a copy of it was seen before by the 359 delivery agent that is executing the Sieve 360 script. 361 RFC number: this RFC 362 Contact address: Sieve mailing list 364 This information should be added to the list of sieve extensions 365 given on http://www.iana.org/assignments/sieve-extensions. 367 8. References 369 8.1. Normative References 371 [DATE-INDEX] 372 Freed, N., "Sieve Email Filtering: Date and Index 373 Extensions", RFC 5260, July 2008. 375 [INCLUDE] Daboo, C. and A. Stone, "Sieve Email Filtering: Include 376 Extension", RFC 6609, May 2012. 378 [KEYWORDS] 379 Bradner, S., "Key words for use in RFCs to Indicate 380 Requirement Levels", BCP 14, RFC 2119, March 1997. 382 [SIEVE] Guenther, P. and T. Showalter, "Sieve: An Email Filtering 383 Language", RFC 5228, January 2008. 385 [SIEVE-MIME] 386 Hansen, T. and C. Daboo, "Sieve Email Filtering: MIME Part 387 Tests, Iteration, Extraction, Replacement, and Enclosure", 388 RFC 5703, October 2009. 390 [VARIABLES] 391 Homme, K., "Sieve Email Filtering: Variables Extension", 392 RFC 5229, January 2008. 394 8.2. Informative References 396 [IMAP4FLAGS] 397 Melnikov, A., "Sieve Email Filtering: Imap4flags 398 Extension", RFC 5232, January 2008. 400 [MAILBOX] Melnikov, A., "The Sieve Mail-Filtering Language -- 401 Extensions for Checking Mailbox Status and Accessing 402 Mailbox Metadata", RFC 5490, March 2009. 404 [NOTIFY] Melnikov, A., Leiba, B., Segmuller, W., and T. Martin, 405 "Sieve Email Filtering: Extension for Notifications", 406 RFC 5435, January 2009. 408 [NOTIFY-XMPP] 409 Saint-Andre, P. and A. Melnikov, "Sieve Notification 410 Mechanism: Extensible Messaging and Presence Protocol 411 (XMPP)", RFC 5437, January 2009. 413 [VACATION] 414 Showalter, T. and N. Freed, "Sieve Email Filtering: 415 Vacation Extension", RFC 5230, January 2008. 417 Author's Address 419 Stephan Bosch 420 Enschede 421 NL 423 Email: stephan@rename-it.nl