idnits 2.17.1 draft-ietf-appsawg-sieve-duplicate-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document date (November 4, 2013) is 3816 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Obsolete informational reference (is this intentional?): RFC 3501 (ref. 'IMAP') (Obsoleted by RFC 9051) Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 APPSAWG S. Bosch 3 Internet-Draft November 4, 2013 4 Intended status: Standards Track 5 Expires: May 8, 2014 7 Sieve Email Filtering: Detecting Duplicate Deliveries 8 draft-ietf-appsawg-sieve-duplicate-01 10 Abstract 12 This document defines a new test command "duplicate" for the "Sieve" 13 email filtering language. This test adds the ability to detect 14 duplicate message deliveries. The main application for this new test 15 is handling duplicate deliveries commonly caused by mailing list 16 subscriptions or redirected mail addresses. The detection is 17 normally performed by matching the message ID to an internal list of 18 message IDs from previously delivered messages. For more complex 19 applications, the "duplicate" test can also use the content of a 20 specific header or other parts of the message. 22 Status of this Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at http://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on May 8, 2014. 39 Copyright Notice 41 Copyright (c) 2013 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (http://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 57 2. Conventions Used in This Document . . . . . . . . . . . . . . 3 58 3. Test "duplicate" . . . . . . . . . . . . . . . . . . . . . . . 3 59 3.1. Interaction with Other Sieve Extensions . . . . . . . . . 7 60 4. Sieve Capability Strings . . . . . . . . . . . . . . . . . . . 7 61 5. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 62 5.1. Example 1 . . . . . . . . . . . . . . . . . . . . . . . . 7 63 5.2. Example 2 . . . . . . . . . . . . . . . . . . . . . . . . 8 64 5.3. Example 3 . . . . . . . . . . . . . . . . . . . . . . . . 8 65 6. Security Considerations . . . . . . . . . . . . . . . . . . . 9 66 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 67 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10 68 8.1. Normative References . . . . . . . . . . . . . . . . . . . 10 69 8.2. Informative References . . . . . . . . . . . . . . . . . . 11 70 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 11 72 1. Introduction 74 This is an extension to the Sieve filtering language defined by RFC 75 5228 [SIEVE]. It adds a test to determine whether a certain message 76 was seen before by the delivery agent in an earlier execution of the 77 Sieve script. This can be used to detect and handle duplicate 78 message deliveries. 80 Duplicate deliveries are a common side-effect of being subscribed to 81 a mailing list. For example, if a member of the list decides to 82 reply to both the user and the mailing list itself, the user will get 83 one copy of the message directly and another through mailing list. 84 Also, if someone cross-posts over several mailing lists to which the 85 user is subscribed, the user will receive a copy from each of those 86 lists. In another scenario, the user has several redirected mail 87 addresses all pointing to his main mail account. If one of the 88 user's contacts sends the message to more than one of those 89 addresses, the user will likely receive more than a single copy. 90 Using the "duplicate" extension, users have the means to detect and 91 handle such duplicates, e.g. by discarding them, marking them as 92 "seen", or putting them in a special folder. 94 Duplicate messages are normally detected using the Message-ID header 95 field, which is required to be unique for each message. However, the 96 "duplicate" test is flexible enough to use different criteria for 97 defining what makes a message a duplicate, for example based on the 98 subject line or parts of the message body. Other applications of 99 this new test command are also possible, as long as the tracked 100 unique value is a string. 102 2. Conventions Used in This Document 104 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 105 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 106 document are to be interpreted as described in [KEYWORDS]. 108 Conventions for notations are as in [SIEVE] Section 1.1, including 109 use of the "Usage:" label for the definition of action and tagged 110 arguments syntax. 112 3. Test "duplicate" 114 Usage: "duplicate" [":handle" ] 115 [":header" / 116 ":uniqueid" ] 117 [":seconds" ] [":last"] 119 In its basic form, the "duplicate" test keeps track of which messages 120 were seen before by this test during an earlier Sieve execution. 121 Messages are identified by their message ID as contained in the 122 Message-ID header. The "duplicate" test evaluates to "true" when the 123 message was seen before and it evaluates to "false" when it was not. 125 As a side-effect, the "duplicate" test adds the message ID to an 126 internal duplicate tracking list once the Sieve execution finishes 127 successfully. This way, the same test will evaluate to "true" during 128 the next Sieve execution. Implementations MUST prevent making any 129 definitive modifications to the internal duplicate tracking list 130 until the Sieve script execution finishes successfully. If failing 131 script executions add the message ID to the duplicate tracking list, 132 all "duplicate" tests in the Sieve script would erroneously yield 133 "true" for the next delivery attempt of the same message, which can 134 -- depending on the action taken for a duplicate -- easily lead to 135 discarding the message without further notice. 137 However, deferring the definitive modification of the tracking list 138 to the end of a successful Sieve script execution is not without 139 problems. It can cause a race condition when a duplicate message is 140 delivered in parallel before the tracking list is updated. This way, 141 a duplicate message could be missed by the "duplicate" test. More 142 complex implementations could use a locking mechanism to prevent this 143 problem. But, irrespective of what implementation is chosen, 144 situations in which the "duplicate" test erroneously yields "true" 145 MUST be prevented. 147 The "duplicate" test MUST only check for duplicates amongst message 148 ID values encountered in previous executions of the Sieve script; it 149 MUST NOT consider ID values encountered earlier in the current Sieve 150 script execution as potential duplicates. This means that all 151 "duplicate" tests in a Sieve script execution, including those 152 located in scripts included using the "include" [INCLUDE] extension, 153 MUST always yield the same result if the arguments are identical. 155 Implementations SHOULD limit the number of entries in the duplicate 156 tracking list. When limiting the number of entries, implementations 157 SHOULD discard the oldest ones first. 159 Also, implementations SHOULD let entries in the tracking list expire 160 after a short period of time. The user can explicitly control the 161 length of this expiration time by means of the ":seconds" argument, 162 which is always specified in seconds. If the ":seconds" argument is 163 omitted, an appropriate default value MUST be used. A default 164 expiration time of around 7 days is deemed to be appropriate. Sites 165 SHOULD impose a maximum limit on the expiration time. If that limit 166 is exceeded, the maximum value MUST silently be substituted; 167 exceeding the limit MUST NOT produce an error. If the ":seconds" 168 argument is zero, the "duplicate" test MUST yield "false" 169 unconditionally. 171 When the ":last" argument is omitted, the expiration time for entries 172 in the duplicate tracking list MUST be measured relative to the 173 moment at which the entry was first created; i.e., at the end of the 174 successful script execution during which "duplicate" test returned 175 "false" for a message with that particular message ID value. This 176 means that subsequent duplicate messages have no influence on the 177 time at which the entry in the duplicate tracking list finally 178 expires. 180 In contrast, when the ":last" argument is specified, the expiration 181 time MUST be measured relative to the last script execution during 182 which the "duplicate" test was used to check the entry's message ID 183 value. This effectively means that the entry in the duplicate 184 tracking will not expire while duplicate messages with the 185 corresponding message ID keep being delivered within intervals 186 smaller than the expiration time. 188 By default, the content of the message's Message-ID header field is 189 used as the unique ID for duplicate tracking. For more complex 190 applications, the "duplicate" test can also be used to detect 191 duplicate deliveries based on other message text. Then, the tracked 192 unique ID can be an arbitrary string value extracted from the 193 message. By adding the ":header" argument with a message header 194 field name, the content of the specified header field can be used as 195 the tracked unique ID instead of the default Message-ID header. 196 Alternatively, the tracked unique ID can be specified explicitly 197 using the ":uniqueid" argument. The ":header" and ":uniqueid" 198 arguments are mutually exclusive and specifying both for a single 199 "duplicate" test command MUST trigger an error. 201 If the tracked unique ID value is extracted directly from a message 202 header field, i.e., when the ":uniqueid" argument is not used, 203 leading and trailing whitespace (see Section 2.2 of RFC 5228 [SIEVE]) 204 MUST first be trimmed from the value before performing the actual 205 duplicate verification. When the ":uniqueid" argument is used, such 206 normalization concerns are the responsibility of the user. 208 If the header field specified using the ":header" argument exists 209 multiple times in the message, only the first occurrence MUST be used 210 for duplicate tracking. If the specified header field is not present 211 in the message, the "duplicate" test MUST yield "false" 212 unconditionally. In that case the duplicate tracking list is left 213 unmodified by this test, since no unique ID value is available. The 214 same rules apply with respect to the Message-ID header field for the 215 basic "duplicate" test without a ":header" or ":uniqueid" argument, 216 since that header field could also be missing or occur multiple 217 times. 219 The string parameter of the ":uniqueid" argument can be composed from 220 arbitrary text extracted from the message using the "variables" 221 [VARIABLES] extension. To extract text from the message body, the 222 "foreverypart" and "extracttext" [SIEVE-MIME] extensions need to be 223 used as well. This provides the user with detailed control over what 224 identifies a message as a duplicate. 226 The tracked unique ID value MUST be matched case-sensitively, 227 irrespective of whether it originates from a header or is specified 228 explicitly using the ":uniqueid" argument. To achieve case- 229 insensitive behavior, the "set" command added by the "variables" 230 [VARIABLES] extension can be used in combination with the ":uniqueid" 231 argument to normalize the tracked unique ID value to upper or lower 232 case. 234 The "duplicate" test MUST track an unique ID value independent of its 235 source. This means that it does not matter whether values are 236 obtained from the message ID header, from an arbitrary header 237 specified using the ":header" argument or explicitly from the 238 ":uniqueid" argument. For example, for messages with header field 239 "Message-ID: <123456@example.com>", the following three examples are 240 equivalent and match the same entry in the duplicate tracking list: 242 require "duplicate"; 243 if duplicate { 244 discard; 245 } 247 require "duplicate"; 248 if duplicate :header "message-id" { 249 discard; 250 } 252 require "duplicate"; 253 if duplicate :uniqueid "<123456@example.com>" { 254 discard; 255 } 257 Using the ":handle" argument, the duplicate test can be employed for 258 multiple independent purposes. The message is recognized as a 259 duplicate only when the tracked unique ID was seen before in an 260 earlier script execution by a "duplicate" test with the same 261 ":handle" argument. 263 NOTE: The necessary mechanism to track duplicate messages is very 264 similar to the mechanism that is needed for tracking duplicate 265 responses for the "vacation" [VACATION] action. One way to implement 266 the necessary mechanism for the "duplicate" test is therefore to 267 store a hash of the tracked unique ID and, if provided, the ":handle" 268 argument. 270 3.1. Interaction with Other Sieve Extensions 272 The "duplicate" test does not support either the "index" 273 [DATE-INDEX], or "mime" [SIEVE-MIME] extensions directly, meaning 274 that none of the ":index", ":mime" or associated arguments are added 275 to the "duplicate" test when these extensions are active. The 276 ":uniqueid" argument can be used in combination with the "variables" 277 [VARIABLES] extension to achieve the same result indirectly. 279 Normally, Sieve scripts are executed at final delivery. However, 280 with the "imapsieve" [IMAPSIEVE] extension, Sieve scripts are invoked 281 when the IMAP [IMAP] server performs operations on the message store, 282 e.g. when messages are uploaded, flagged, or moved to another 283 location. The "duplicate" test is devised for use at final delivery 284 and the semantics in "imapsieve" context are left undefined. 285 Therefore it is NOT RECOMMENDED to allow the "duplicate" test to be 286 used in the context of "imapsieve". 288 4. Sieve Capability Strings 290 A Sieve implementation that defines the "duplicate" test command will 291 advertise the capability string "duplicate". 293 5. Examples 295 5.1. Example 1 297 In this basic example message duplicates are detected by tracking the 298 Message-ID header. Duplicate deliveries are stored in a special 299 folder contained in the user's Trash folder. If the folder does not 300 exist, it is created automatically using the "mailbox" [MAILBOX] 301 extension. This way, the user has a chance to recover messages when 302 necessary. Messages that are not recognized as duplicates are stored 303 in the user's inbox as normal. 305 require ["duplicate", "fileinto", "mailbox"]; 307 if duplicate { 308 fileinto :create "Trash/Duplicate"; 309 } 311 5.2. Example 2 313 This example shows a more complex use of the "duplicate" test. The 314 user gets network alerts from a set of remote automated monitoring 315 systems. Multiple notifications can be received about the same event 316 from different monitoring systems. The Message-ID of these messages 317 is different, because these are all distinct messages from different 318 senders. To avoid being notified multiple times about the same event 319 the user writes the following script: 321 require ["duplicate", "variables", "imap4flags", 322 "fileinto"]; 324 if header :matches "subject" "ALERT: *" { 325 if duplicate :seconds 60 :uniqueid "${1}" { 326 setflag "\\seen"; 327 } 328 fileinto "Alerts"; 329 } 331 The subjects of the notification message are structured with a 332 predictable pattern which includes a description of the event. In 333 the script above the "duplicate" test is used to detect duplicate 334 alert events. The message subject is matched against a pattern and 335 the event description is extracted using the "variables" [VARIABLES] 336 extension. If a message with that event in the subject was received 337 before, but more than a minute ago, it is not detected as a duplicate 338 due to the specified ":seconds" argument. In the the event of a 339 duplicate, the message is marked as "seen" using the "imap4flags" 340 [IMAP4FLAGS] extension. All alert messages are put into the "Alerts" 341 mailbox irrespective of whether those messages are duplicates or not. 343 5.3. Example 3 345 This example shows how the "duplicate" test can be used to limit the 346 frequency of notifications sent using the "enotify" [NOTIFY] 347 extension. Consider the following scenario: a mail user receives 348 XMPP notifications [NOTIFY-XMPP] about new mail through Sieve, but 349 sometimes a single contact sends many messages in a short period of 350 time. Now the user wants to prevent being notified of all of those 351 messages. The user wants to be notified about messages from each 352 person at most once per 30 minutes and writes the following script: 354 require ["variables", "envelope", "enotify", "duplicate"]; 356 if envelope :matches "from" "*" { set "sender" "${1}"; } 357 if header :matches "subject" "*" { set "subject" "${1}"; } 359 if not duplicate :seconds 1800 :uniqueid "${sender}") 360 { 361 notify :message "[SIEVE] ${sender}: ${subject}" 362 "xmpp:user@im.example.com"; 363 } 365 The example shown above uses the message envelope sender rather than 366 the Message-ID header as the unique ID for duplicate tracking. 368 The example can be extended to allow multiple messages from the same 369 sender in close succession as long as the discussed subject is 370 different. This can be achieved as follows: 372 require ["variables", "envelope", "enotify", "duplicate"]; 374 if envelope :matches "from" "*" { set "sender" "${1}"; } 375 if header :matches "subject" "*" { set "subject" "${1}"; } 377 # account for 'Re:' prefix 378 if string :comparator "i;ascii-casemap" 379 :matches "${subject}" "Re:*" 380 { 381 set "subject" "${1}"; 382 } 383 if not duplicate :seconds 1800 384 :uniqueid "${sender} ${subject}") 385 { 386 notify :message "[SIEVE] ${sender}: ${subject}" 387 "xmpp:user@im.example.com"; 388 } 390 This uses a combination of the message envelope sender and the 391 subject of the message as the unique ID for duplicate tracking. 393 6. Security Considerations 395 A flood of unique messages could cause the list of tracked message ID 396 values to grow indefinitely. Implementations therefore SHOULD 397 implement limits on the number and lifespan of entries in that list. 399 7. IANA Considerations 401 The following template specifies the IANA registration of the Sieve 402 extension specified in this document: 404 To: iana@iana.org 405 Subject: Registration of new Sieve extension 407 Capability name: duplicate 408 Description: Adds test 'duplicate' that can be used to test 409 whether a particular message is a duplicate; 410 i.e., whether a copy of it was seen before by the 411 delivery agent that is executing the Sieve 412 script. 413 RFC number: this RFC 414 Contact address: Sieve mailing list 416 This information should be added to the list of sieve extensions 417 given on http://www.iana.org/assignments/sieve-extensions. 419 8. References 421 8.1. Normative References 423 [DATE-INDEX] 424 Freed, N., "Sieve Email Filtering: Date and Index 425 Extensions", RFC 5260, July 2008. 427 [IMAPSIEVE] 428 Leiba, B., "Support for Internet Message Access Protocol 429 (IMAP) Events in Sieve", RFC 6785, November 2012. 431 [INCLUDE] Daboo, C. and A. Stone, "Sieve Email Filtering: Include 432 Extension", RFC 6609, May 2012. 434 [KEYWORDS] 435 Bradner, S., "Key words for use in RFCs to Indicate 436 Requirement Levels", BCP 14, RFC 2119, March 1997. 438 [SIEVE] Guenther, P. and T. Showalter, "Sieve: An Email Filtering 439 Language", RFC 5228, January 2008. 441 [SIEVE-MIME] 442 Hansen, T. and C. Daboo, "Sieve Email Filtering: MIME Part 443 Tests, Iteration, Extraction, Replacement, and Enclosure", 444 RFC 5703, October 2009. 446 [VARIABLES] 447 Homme, K., "Sieve Email Filtering: Variables Extension", 448 RFC 5229, January 2008. 450 8.2. Informative References 452 [IMAP] Crispin, M., "INTERNET MESSAGE ACCESS PROTOCOL - VERSION 453 4rev1", RFC 3501, March 2003. 455 [IMAP4FLAGS] 456 Melnikov, A., "Sieve Email Filtering: Imap4flags 457 Extension", RFC 5232, January 2008. 459 [MAILBOX] Melnikov, A., "The Sieve Mail-Filtering Language -- 460 Extensions for Checking Mailbox Status and Accessing 461 Mailbox Metadata", RFC 5490, March 2009. 463 [NOTIFY] Melnikov, A., Leiba, B., Segmuller, W., and T. Martin, 464 "Sieve Email Filtering: Extension for Notifications", 465 RFC 5435, January 2009. 467 [NOTIFY-XMPP] 468 Saint-Andre, P. and A. Melnikov, "Sieve Notification 469 Mechanism: Extensible Messaging and Presence Protocol 470 (XMPP)", RFC 5437, January 2009. 472 [VACATION] 473 Showalter, T. and N. Freed, "Sieve Email Filtering: 474 Vacation Extension", RFC 5230, January 2008. 476 Author's Address 478 Stephan Bosch 479 Enschede 480 NL 482 Email: stephan@rename-it.nl