idnits 2.17.1 draft-ietf-appsawg-sieve-duplicate-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document date (March 3, 2014) is 3707 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Obsolete informational reference (is this intentional?): RFC 3501 (ref. 'IMAP') (Obsoleted by RFC 9051) Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 APPSAWG S. Bosch 3 Internet-Draft March 3, 2014 4 Intended status: Standards Track 5 Expires: September 4, 2014 7 Sieve Email Filtering: Detecting Duplicate Deliveries 8 draft-ietf-appsawg-sieve-duplicate-03 10 Abstract 12 This document defines a new test command "duplicate" for the "Sieve" 13 email filtering language. This test adds the ability to detect 14 duplications. The main application for this new test is handling 15 duplicate deliveries commonly caused by mailing list subscriptions or 16 redirected mail addresses. The detection is normally performed by 17 matching the message ID to an internal list of message IDs from 18 previously delivered messages. For more complex applications, the 19 "duplicate" test can also use the content of a specific header or 20 other parts of the message. 22 Status of this Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at http://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on September 4, 2014. 39 Copyright Notice 41 Copyright (c) 2014 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (http://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 57 2. Conventions Used in This Document . . . . . . . . . . . . . . 3 58 3. Test "duplicate" . . . . . . . . . . . . . . . . . . . . . . . 3 59 3.1. Interaction with Other Sieve Extensions . . . . . . . . . 8 60 4. Sieve Capability Strings . . . . . . . . . . . . . . . . . . . 8 61 5. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 62 5.1. Example 1 . . . . . . . . . . . . . . . . . . . . . . . . 8 63 5.2. Example 2 . . . . . . . . . . . . . . . . . . . . . . . . 8 64 5.3. Example 3 . . . . . . . . . . . . . . . . . . . . . . . . 9 65 5.4. Example 4 . . . . . . . . . . . . . . . . . . . . . . . . 10 66 6. Security Considerations . . . . . . . . . . . . . . . . . . . 11 67 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 68 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11 69 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12 70 9.1. Normative References . . . . . . . . . . . . . . . . . . . 12 71 9.2. Informative References . . . . . . . . . . . . . . . . . . 12 72 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 13 74 1. Introduction 76 This document specifies an extension to the Sieve filtering language 77 defined by RFC 5228 [SIEVE]. It adds a test to track whether or not 78 a text string was seen before by the delivery agent in an earlier 79 execution of the Sieve script. This can be used to detect and handle 80 duplicate message deliveries. 82 Duplicate deliveries are a common side-effect of being subscribed to 83 a mailing list. For example, if a member of the list decides to 84 reply to both the user and the mailing list itself, the user will 85 often get one copy of the message directly and another through the 86 mailing list. Also, if someone cross-posts over several mailing 87 lists to which the user is subscribed, the user will likely receive a 88 copy from each of those lists. In another scenario, the user has 89 several redirected mail addresses all pointing to his main mail 90 account. If one of the user's contacts sends the message to more 91 than one of those addresses, the user will likely receive more than a 92 single copy. Using the "duplicate" extension, users have the means 93 to detect and handle such duplicates, e.g. by discarding them, 94 marking them as "seen", or putting them in a special folder. 96 Duplicate messages are normally detected using the Message-ID header 97 field, which is required to be unique for each message. However, the 98 "duplicate" test is flexible enough to use different criteria for 99 defining what makes a message a duplicate, for example on the subject 100 line or parts of the message body. Other applications of this new 101 test command are also possible, as long as the tracked unique value 102 is a string. 104 2. Conventions Used in This Document 106 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 107 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 108 document are to be interpreted as described in [KEYWORDS]. 110 Conventions for notations are as in [SIEVE] Section 1.1, including 111 use of the "Usage:" label for the definition of action and tagged 112 arguments syntax. 114 3. Test "duplicate" 116 Usage: "duplicate" [":handle" ] 117 [":header" / 118 ":uniqueid" ] 119 [":seconds" ] [":last"] 121 In its basic form, the "duplicate" test keeps track of which messages 122 were seen before by this test during an earlier Sieve execution. 123 Messages are by default identified by their message ID as contained 124 in the Message-ID header. The "duplicate" test evaluates to "true" 125 when the message was seen before and it evaluates to "false" when it 126 was not. 128 As a side-effect, the "duplicate" test adds the message ID to an 129 internal duplicate tracking list once the Sieve execution finishes 130 successfully. This way, the same test will evaluate to "true" during 131 the next Sieve execution. Note that this side-effect is performed 132 only when the "duplicate" test is actually evaluated. If the 133 "duplicate" test is nested in a control structure or it is not the 134 first item of an "allof" or "anyof" test list, its evaluation depends 135 on the result of preceding tests, which may produce unexpected 136 results. 138 Implementations MUST only update the internal duplicate tracking list 139 when the Sieve script execution finishes successfully. If failing 140 script executions add the message ID to the duplicate tracking list, 141 all "duplicate" tests in the Sieve script would erroneously yield 142 "true" for the next delivery attempt of the same message, which can 143 -- depending on the action taken for a duplicate -- easily lead to 144 discarding the message without further notice. 146 However, deferring the definitive modification of the tracking list 147 to the end of a successful Sieve script execution is not without 148 problems. It can cause a race condition when a duplicate message is 149 delivered in parallel before the tracking list is updated. This way, 150 a duplicate message could be missed by the "duplicate" test. More 151 complex implementations could use a locking mechanism to prevent this 152 problem. But, irrespective of what implementation is chosen, 153 situations in which the "duplicate" test erroneously yields "true" 154 MUST be prevented. 156 The "duplicate" test MUST only check for duplicates amongst message 157 ID values encountered in previous executions of the Sieve script; it 158 MUST NOT consider ID values encountered earlier in the current Sieve 159 script execution as potential duplicates. This means that all 160 "duplicate" tests in a Sieve script execution, including those 161 located in scripts included using the "include" [INCLUDE] extension, 162 MUST always yield the same result if the arguments are identical. 164 Implementations SHOULD limit the number of entries in the duplicate 165 tracking list. When limiting the number of entries, implementations 166 SHOULD discard the oldest ones first. 168 Also, implementations SHOULD let entries in the tracking list expire 169 after a short period of time. The user can explicitly control the 170 length of this expiration time by means of the ":seconds" argument, 171 which accepts an integer value specifying the timeout value in 172 seconds. If the ":seconds" argument is omitted, an appropriate 173 default value MUST be used. A default expiration time of around 7 174 days is usually appropriate. Sites SHOULD impose a maximum limit on 175 the expiration time. If that limit is exceeded by the ":seconds" 176 argument, the maximum value MUST silently be substituted; exceeding 177 the limit MUST NOT produce an error. If the ":seconds" argument is 178 zero, the "duplicate" test MUST yield "false" unconditionally. 180 When the ":last" argument is omitted, the expiration time for entries 181 in the duplicate tracking list MUST be measured relative to the 182 moment at which the entry was first created; i.e., at the end of the 183 successful script execution during which "duplicate" test returned 184 "false" for a message with that particular message ID value. This 185 means that subsequent duplicate messages have no influence on the 186 time at which the entry in the duplicate tracking list finally 187 expires. 189 In contrast, when the ":last" argument is specified, the expiration 190 time MUST be measured relative to the last script execution during 191 which the "duplicate" test was used to check the entry's message ID 192 value. This effectively means that the entry in the duplicate 193 tracking will not expire while duplicate messages with the 194 corresponding message ID keep being delivered within intervals 195 smaller than the expiration time. 197 By default, the content of the message's Message-ID header field is 198 used as the unique ID for duplicate tracking. For more complex 199 applications, the "duplicate" test can also be used to detect 200 duplicate deliveries based on other message text. Then, the tracked 201 unique ID can be an arbitrary string value extracted from the 202 message. By adding the ":header" argument with a message header 203 field name, the content of the specified header field can be used as 204 the tracked unique ID instead of the default Message-ID header. 205 Alternatively, the tracked unique ID can be specified explicitly 206 using the ":uniqueid" argument. The ":header" and ":uniqueid" 207 arguments are mutually exclusive and specifying both for a single 208 "duplicate" test command MUST trigger an error. 210 The syntax rules for the header name parameter of the ":header" 211 argument are specified in Section 2.4.2.2 of RFC 5228 [SIEVE]. Note 212 that implementations MUST NOT trigger an error for an invalid header 213 name. Instead, the "duplicate" test MUST yield "false" 214 unconditionally in this case. The parameter of the ":uniqueid" 215 argument can be any string. 217 The Messsage-ID header field is assumed to be globally unique as 218 required in Section 3.6.4 of RFC 5322 [IMAIL]. In practice, this 219 assumption may not aways prove to be true. The "duplicate" tests 220 does not deal with this situation implicitly, which means that false 221 duplicates may be detected in this case. However, the user can 222 address such situations by specifying an alternative means of message 223 identification using the ":header" or the ":uniqueid" argument. 225 If the tracked unique ID value is extracted directly from a message 226 header field, i.e., when the ":uniqueid" argument is not used, the 227 following operations MUST be performed before the actual duplicate 228 verification: 230 o Unfold the header line as described in [IMAIL] Section 2.2.3. (see 231 also Section 2.4.2.2 of RFC 5228 [SIEVE]). 233 o If possible, convert the header value to Unicode, encoded as UTF-8 234 (see Section 2.7.2 of RFC 5228 [SIEVE]). If conversion is not 235 possible, the value is left unchanged. 237 o Trim leading and trailing whitespace from the header value (see 238 Section 2.2 of RFC 5228 [SIEVE]). 240 Note that these rules also apply to the Message-ID header field used 241 by the basic "duplicate" test without a ":header" or ":uniqueid" 242 argument. When the ":uniqueid" argument is used, such normalization 243 concerns are the responsibility of the user. 245 If the header field specified using the ":header" argument exists 246 multiple times in the message, only the first occurrence MUST be used 247 for duplicate tracking. If the specified header field is not present 248 in the message, the "duplicate" test MUST yield "false" 249 unconditionally. In that case the duplicate tracking list is left 250 unmodified by this test, since no unique ID value is available. The 251 same rules apply with respect to the Message-ID header field for the 252 basic "duplicate" test without a ":header" or ":uniqueid" argument, 253 since that header field could also be missing or occur multiple 254 times. 256 The string parameter of the ":uniqueid" argument can be composed from 257 arbitrary text extracted from the message using the "variables" 258 [VARIABLES] extension. To extract text from the message body, the 259 "foreverypart" and "extracttext" [SIEVE-MIME] extensions need to be 260 used as well. This provides the user with detailed control over what 261 identifies a message as a duplicate. 263 The tracked unique ID value MUST be matched case-sensitively, 264 irrespective of whether it originates from a header or is specified 265 explicitly using the ":uniqueid" argument. To achieve case- 266 insensitive behavior, the "set" command added by the "variables" 267 [VARIABLES] extension can be used in combination with the ":uniqueid" 268 argument to normalize the tracked unique ID value to upper or lower 269 case. 271 The "duplicate" test MUST track a unique ID value independent of its 272 source. This means that it does not matter whether values are 273 obtained from the message ID header, from an arbitrary header 274 specified using the ":header" argument or explicitly from the 275 ":uniqueid" argument. For example, the following three examples are 276 equivalent and match the same entry in the duplicate tracking list: 278 require "duplicate"; 279 if duplicate { 280 discard; 281 } 283 require "duplicate"; 284 if duplicate :header "message-id" { 285 discard; 286 } 288 require ["duplicate", "variables"]; 289 if header :matches "message-id" "*" { 290 if duplicate :uniqueid "${0}" { 291 discard; 292 } 293 } 295 The ":handle" argument can be used to override this default behavior. 296 The ":handle" argument separates a "duplicate" test from other 297 duplicate tests with a different or omitted ":handle" argument. 298 Using the ":handle" argument, unrelated "duplicate" tests can be 299 prevented from interfering with each other: a message is only 300 recognized as a duplicate when the tracked unique ID was seen before 301 in an earlier script execution by a "duplicate" test with the same 302 ":handle" argument. 304 NOTE: The necessary mechanism to track duplicate messages is very 305 similar to the mechanism that is needed for tracking duplicate 306 responses for the "vacation" [VACATION] action. One way to implement 307 the necessary mechanism for the "duplicate" test is therefore to 308 store a hash of the tracked unique ID and, if provided, the ":handle" 309 argument. 311 3.1. Interaction with Other Sieve Extensions 313 The "duplicate" test does not support either the "index" 314 [DATE-INDEX], or "mime" [SIEVE-MIME] extensions directly, meaning 315 that none of the ":index", ":mime" or associated arguments are added 316 to the "duplicate" test when these extensions are active. The 317 ":uniqueid" argument can be used in combination with the "variables" 318 [VARIABLES] extension to achieve the same result indirectly. 320 Normally, Sieve scripts are executed at final delivery. However, 321 with the "imapsieve" [IMAPSIEVE] extension, Sieve scripts are invoked 322 when the IMAP [IMAP] server performs operations on the message store, 323 e.g. when messages are uploaded, flagged, or moved to another 324 location. The "duplicate" test is devised for use at final delivery 325 and the semantics in the "imapsieve" context are left undefined. 326 Therefore it is NOT RECOMMENDED to allow the "duplicate" test to be 327 used in the context of "imapsieve". 329 4. Sieve Capability Strings 331 A Sieve implementation that defines the "duplicate" test command will 332 advertise the capability string "duplicate". 334 5. Examples 336 5.1. Example 1 338 In this basic example, message duplicates are detected by tracking 339 the Message-ID header. Duplicate deliveries are stored in a special 340 folder contained in the user's Trash folder. If the folder does not 341 exist, it is created automatically using the "mailbox" [MAILBOX] 342 extension. This way, the user has a chance to recover messages when 343 necessary. Messages that are not recognized as duplicates are stored 344 in the user's inbox as normal. 346 require ["duplicate", "fileinto", "mailbox"]; 348 if duplicate { 349 fileinto :create "Trash/Duplicate"; 350 } 352 5.2. Example 2 354 This example shows a more complex use of the "duplicate" test. The 355 user gets network alerts from a set of remote automated monitoring 356 systems. Several notifications can be received about the same event 357 from different monitoring systems. The Message-ID of these messages 358 is different, because these are all distinct messages from different 359 senders. To avoid being notified more than a single time about the 360 same event the user writes the following script: 362 require ["duplicate", "variables", "imap4flags", 363 "fileinto"]; 365 if header :matches "subject" "ALERT: *" { 366 if duplicate :seconds 60 :uniqueid "${1}" { 367 setflag "\\seen"; 368 } 369 fileinto "Alerts"; 370 } 372 The subjects of the notification message are structured with a 373 predictable pattern which includes a description of the event. In 374 the script above, the "duplicate" test is used to detect duplicate 375 alert events. The message subject is matched against a pattern and 376 the event description is extracted using the "variables" [VARIABLES] 377 extension. If a message with that event in the subject was received 378 before, but more than a minute ago, it is not detected as a duplicate 379 due to the specified ":seconds" argument. In the the event of a 380 duplicate, the message is marked as "seen" using the "imap4flags" 381 [IMAP4FLAGS] extension. All alert messages are put into the "Alerts" 382 mailbox irrespective of whether those messages are duplicates or not. 384 5.3. Example 3 386 This example shows how the "duplicate" test can be used to limit the 387 frequency of notifications sent using the "enotify" [NOTIFY] 388 extension. Consider the following scenario: a mail user receives 389 XMPP notifications [NOTIFY-XMPP] about new mail through Sieve, but 390 sometimes a single contact sends many messages in a short period of 391 time. Now the user wants to prevent being notified of all of those 392 messages. The user wants to be notified about messages from each 393 person at most once per 30 minutes and writes the following script: 395 require ["variables", "envelope", "enotify", "duplicate"]; 397 if envelope :matches "from" "*" { set "sender" "${1}"; } 398 if header :matches "subject" "*" { set "subject" "${1}"; } 400 if not duplicate :seconds 1800 :uniqueid "${sender}" 401 { 402 notify :message "[SIEVE] ${sender}: ${subject}" 403 "xmpp:user@im.example.com"; 404 } 405 The example shown above uses the message envelope sender rather than 406 the Message-ID header as the unique ID for duplicate tracking. 408 The example can be extended to allow more messages from the same 409 sender in close succession as long as the discussed subject is 410 different. This can be achieved as follows: 412 require ["variables", "envelope", "enotify", "duplicate"]; 414 if envelope :matches "from" "*" { set "sender" "${1}"; } 415 if header :matches "subject" "*" { set "subject" "${1}"; } 417 # account for 'Re:' prefix 418 if string :comparator "i;ascii-casemap" 419 :matches "${subject}" "Re:*" 420 { 421 set "subject" "${1}"; 422 } 423 if not duplicate :seconds 1800 424 :uniqueid "${sender} ${subject}" 425 { 426 notify :message "[SIEVE] ${sender}: ${subject}" 427 "xmpp:user@im.example.com"; 428 } 430 This uses a combination of the message envelope sender and the 431 subject of the message as the unique ID for duplicate tracking. 433 5.4. Example 4 435 For this example, the mail user uses the "duplicate" test for two 436 separate applications: for discarding duplicate events from a 437 notification system and to mark certain follow-up messages in a 438 software support mailing as "seen" using the "imap4flags" 439 [IMAP4FLAGS] extension. 441 The two "duplicate" tests in the following example each use a 442 different header to identify messages. However, these "X-Event-ID" 443 and "X-Ticket-ID headers can have similar values in this case (e.g. 444 both based on a time stamp), meaning that one "duplicate" test can 445 erroneously detect duplicates based on ID values tracked by the 446 other. Therefore, the user wants to prevent the second "duplicate" 447 test from matching ID values tracked by the first "duplicate" test 448 and vice versa. This is achieved by specifying different ":handle" 449 arguments for these tests. 451 require ["duplicate", "imap4flags"]; 453 if duplicate :header "X-Event-ID" :handle "notifier" { 454 discard; 455 } 456 if allof ( 457 duplicate :header "X-Ticket-ID" :handle "support", 458 address "to" "support@example.com", 459 header :contains "subject" "fileserver") 460 { 461 setflag "\\seen"; 462 } 464 6. Security Considerations 466 A flood of unique messages could cause the list of tracked message ID 467 values to grow indefinitely. Implementations SHOULD apply limits on 468 the number and lifespan of entries in that list. 470 7. IANA Considerations 472 The following template specifies the IANA registration of the Sieve 473 extension specified in this document: 475 To: iana@iana.org 476 Subject: Registration of new Sieve extension 478 Capability name: duplicate 479 Description: Adds test 'duplicate' that can be used to test 480 whether a particular message is a duplicate; 481 i.e., whether a copy of it was seen before by the 482 delivery agent that is executing the Sieve 483 script. 484 RFC number: this RFC 485 Contact address: Sieve mailing list 487 This information should be added to the list of sieve extensions 488 given on http://www.iana.org/assignments/sieve-extensions. 490 8. Acknowledgements 492 Thanks to Cyrus Daboo, Arnt Gulbrandsen, Tony Hansen, Kristin Hubner, 493 Alexey Melnikov, Subramanian Moonesamy, Tom Petch, Hector Santos, 494 Robert Sparks, and Aaron Stone for reviews and suggestions. With 495 special thanks to Ned Freed for his guidance and support. 497 9. References 499 9.1. Normative References 501 [DATE-INDEX] 502 Freed, N., "Sieve Email Filtering: Date and Index 503 Extensions", RFC 5260, July 2008. 505 [IMAIL] Resnick, P., Ed., "Internet Message Format", RFC 5322, 506 October 2008. 508 [IMAPSIEVE] 509 Leiba, B., "Support for Internet Message Access Protocol 510 (IMAP) Events in Sieve", RFC 6785, November 2012. 512 [INCLUDE] Daboo, C. and A. Stone, "Sieve Email Filtering: Include 513 Extension", RFC 6609, May 2012. 515 [KEYWORDS] 516 Bradner, S., "Key words for use in RFCs to Indicate 517 Requirement Levels", BCP 14, RFC 2119, March 1997. 519 [SIEVE] Guenther, P. and T. Showalter, "Sieve: An Email Filtering 520 Language", RFC 5228, January 2008. 522 [SIEVE-MIME] 523 Hansen, T. and C. Daboo, "Sieve Email Filtering: MIME Part 524 Tests, Iteration, Extraction, Replacement, and Enclosure", 525 RFC 5703, October 2009. 527 [VARIABLES] 528 Homme, K., "Sieve Email Filtering: Variables Extension", 529 RFC 5229, January 2008. 531 9.2. Informative References 533 [IMAP] Crispin, M., "INTERNET MESSAGE ACCESS PROTOCOL - VERSION 534 4rev1", RFC 3501, March 2003. 536 [IMAP4FLAGS] 537 Melnikov, A., "Sieve Email Filtering: Imap4flags 538 Extension", RFC 5232, January 2008. 540 [MAILBOX] Melnikov, A., "The Sieve Mail-Filtering Language -- 541 Extensions for Checking Mailbox Status and Accessing 542 Mailbox Metadata", RFC 5490, March 2009. 544 [NOTIFY] Melnikov, A., Leiba, B., Segmuller, W., and T. Martin, 545 "Sieve Email Filtering: Extension for Notifications", 546 RFC 5435, January 2009. 548 [NOTIFY-XMPP] 549 Saint-Andre, P. and A. Melnikov, "Sieve Notification 550 Mechanism: Extensible Messaging and Presence Protocol 551 (XMPP)", RFC 5437, January 2009. 553 [VACATION] 554 Showalter, T. and N. Freed, "Sieve Email Filtering: 555 Vacation Extension", RFC 5230, January 2008. 557 Author's Address 559 Stephan Bosch 560 Enschede 561 NL 563 Email: stephan@rename-it.nl