idnits 2.17.1 draft-ietf-sieve-regex-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([IEEE.1003-2.1992]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 23, 2010) is 5148 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'IEEE.1003-2.1992' Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Sieve Working Group K. Murchison 3 Internet-Draft Carnegie Mellon University 4 Expires: September 24, 2010 N. Freed 5 Oracle Corporation 6 March 23, 2010 8 Sieve Email Filtering: Regular Expression Extension 9 draft-ietf-sieve-regex-01.txt 11 Abstract 13 This document describes the "regex" extension to the Sieve email 14 filtering language. In some cases, it is desirable to have a string 15 matching mechanism which is more powerful than a simple exact match, 16 a substring match or a glob-style wildcard match. The regular 17 expression matching mechanism defined in this draft provides users 18 with much more powerful string matching capabilities. 20 Change History (to be removed prior to publication as an RFC) 22 Changes from draft-murchison-sieve-regex-08: 24 o Updated to XML source. 26 o Documented interaction with variables. 28 Changes from draft-ietf-sieve-regex-00: 30 o Various cleanup and updates. 32 o Added trial text specifying comparator interactions. 34 Open Issues (to be removed prior to publication as an RFC) 36 o The major open issue with this draft is what to do, if anything, 37 about localization/internationalization. Are [IEEE.1003-2.1992] 38 collating sequences and character equivalents sufficient? Should 39 we reference the Unicode technical specification? Should we punt 40 and publish the document as experimental? 42 o Is the current approach to comparator integration the right one to 43 use? 45 o Should we allow shorthands such as \\b (word boundary) and \\w 46 (word character)? 48 o Should we allow backreferences (useful for matching double words, 49 etc.)? 51 Status of this Memo 53 This Internet-Draft is submitted to IETF in full conformance with the 54 provisions of BCP 78 and BCP 79. 56 Internet-Drafts are working documents of the Internet Engineering 57 Task Force (IETF), its areas, and its working groups. Note that 58 other groups may also distribute working documents as Internet- 59 Drafts. 61 Internet-Drafts are draft documents valid for a maximum of six months 62 and may be updated, replaced, or obsoleted by other documents at any 63 time. It is inappropriate to use Internet-Drafts as reference 64 material or to cite them other than as "work in progress." 66 The list of current Internet-Drafts can be accessed at 67 http://www.ietf.org/ietf/1id-abstracts.txt. 69 The list of Internet-Draft Shadow Directories can be accessed at 70 http://www.ietf.org/shadow.html. 72 This Internet-Draft will expire on September 24, 2010. 74 Copyright Notice 76 Copyright (c) 2010 IETF Trust and the persons identified as the 77 document authors. All rights reserved. 79 This document is subject to BCP 78 and the IETF Trust's Legal 80 Provisions Relating to IETF Documents 81 (http://trustee.ietf.org/license-info) in effect on the date of 82 publication of this document. Please review these documents 83 carefully, as they describe your rights and restrictions with respect 84 to this document. Code Components extracted from this document must 85 include Simplified BSD License text as described in Section 4.e of 86 the Trust Legal Provisions and are provided without warranty as 87 described in the BSD License. 89 1. Introduction 91 Sieve [RFC5228] is a language for filtering email messages at or 92 around the time of final delivery. It is designed to be 93 implementable on either a mail client or mail server. 95 The Sieve base specification defines so-called match types for tests: 96 is, contains, and matches. An "is" test requires an exact match, a 97 "contains" test provides a substring match, and "matches" provides 98 glob-style wildcards. This document describes an extension to the 99 Sieve language that provides a new match type for regular expression 100 comparisons. 102 2. Conventions used in this document 104 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 105 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 106 document are to be interpreted as described in [RFC2119]. 108 The terms used to describe the various components of the Sieve 109 language are taken from Section 1.1 of [RFC5228]. 111 3. Capability Identifier 113 The capability string associated with the extension defined in this 114 document is "regex". 116 4. Regex Match Type 118 When the regex extension is available, commands that support matching 119 may take the optional tagged argument ":regex" to specify that a 120 regular expression match should be performed. The ":regex" match 121 type is subject to the same rules and restrictions as the standard 122 match types defined in [RFC5228]. 124 The "MATCH-TYPE" syntax element defined in [RFC5228] is augmented 125 here as follows: 127 MATCH-TYPE =/ ":regex" 129 5. Interaction with Sieve comparators 131 In order to provide for matches between character sets and case 132 insensitivity, Sieve uses the comparators defined in the Internet 133 Application Protocol Collation Registry [RFC5228]. The comparator 134 used by a given test is specified by the :comparator argument. 136 The interaction between collators and the match types defined in the 137 Sieve base specification is straightforward. Howeer, the nature of 138 regular expressions does not lend itself to this usage for the :regex 139 match type. 141 A component of the definition of many collators is a normalization 142 operation. For example, the "i;octet" comparator employs an identity 143 normalization; whereas the "i;ascii-casema" normalizes all lower case 144 ASCII characters to upper case. 146 The :regex match type only uses the normalization component of the 147 associated comparator. This normalization operation is applied to 148 the key-list argument to the test; the result of that normalization 149 becomes the target of the regular expression comparison. The 150 comparator has no effect on the regular expression pattern or the 151 underlying comparison operation. 153 It is an error to specify a comparator that has no associated 154 normalization operation in conjunction with a :regex match type. 156 6. Regular expression comparisions 158 Implementations MUST support extended regular expressions (EREs) as 159 defined by [IEEE.1003-2.1992]. Any regular expression not defined by 160 [IEEE.1003-2.1992], as well as [IEEE.1003-2.1992] basic regular 161 expressions, word boundaries and backreferences are not supported by 162 this extension. Implementations SHOULD reject regular expressions 163 that are unsupported by this specification as a syntax error. 165 The following tables provide a brief summary of the regular 166 expressions that MUST be supported. This table is presented here 167 only as a guideline. [IEEE.1003-2.1992] should be used as the 168 definitive reference. 170 +------------+------------------------------------------------------+ 171 | Expression | Pattern | 172 +------------+------------------------------------------------------+ 173 | . | Match any single character except newline. | 174 | [ ] | Bracket expression. Match any one of the enclosed | 175 | | characters. A hypen (-) indicates a range of | 176 | | consecutive characters. | 177 | [^ ] | Negated bracket expression. Match any one character | 178 | | NOT in the enclosed list. A hypen (-) indicates a | 179 | | range of consecutive characters. | 180 | \\ | Escape the following special character (match the | 181 | | literal character). Undefined for other characters. | 182 | | NOTE: Unlike [IEEE.1003-2.1992], a double-backslash | 183 | | is required as per section 2.4.2 of [RFC5228]. | 184 +------------+------------------------------------------------------+ 185 Table 1: Items to match a single character 187 +------------+------------------------------------------------------+ 188 | Expression | Pattern | 189 +------------+------------------------------------------------------+ 190 | [: :] | Character class (alnum, alpha, blank, cntrl, digit, | 191 | | graph, lower, print, punct, space, upper, xdigit). | 192 | [= =] | Character equivalents. | 193 | [. .] | Collating sequence. | 194 +------------+------------------------------------------------------+ 196 Table 2: Items to be used within a bracket expression (localization) 198 +------------+------------------------------------------------------+ 199 | Expression | Pattern | 200 +------------+------------------------------------------------------+ 201 | ? | Match zero or one instances. | 202 | * | Match zero or more instances. | 203 | + | Match one or more instances. | 204 | {n,m} | Match any number of instances between n and m | 205 | | (inclusive). {n} matches exactly n instances. {n,} | 206 | | matches n or more instances. | 207 +------------+------------------------------------------------------+ 209 Table 3: Quantifiers - Items to count the preceding regular 210 expression 212 +------------+--------------------------------------------+ 213 | Expression | Pattern | 214 +------------+--------------------------------------------+ 215 | ^ | Match the beginning of the line or string. | 216 | $ | Match the end of the line or string. | 217 +------------+--------------------------------------------+ 219 Table 4: Anchoring - Items to match positions 221 +------------+------------------------------------------------------+ 222 | Expression | Pattern | 223 +------------+------------------------------------------------------+ 224 | | | Alternation. Match either of the separated regular | 225 | | expressions. | 226 | ( ) | Group the enclosed regular expression(s). | 227 +------------+------------------------------------------------------+ 229 Table 5: Other constructs 231 7. Interaction with Sieve Variables 233 This extension is compatible with, and may be used in conjunction 234 with the Sieve Variables extension [RFC5229]. 236 7.1. Match variables 238 A sieve interpreter which supports both "regex" and "variables", MUST 239 set "match variables" (as defined by [RFC5229] section 3.2) whenever 240 the ":regex" match type is used. The list of match variables will 241 contain the strings corresponding to the group operators in the 242 regular expression. The groups are ordered by the position of the 243 opening parenthesis, from left to right. Note that in regular 244 expressions, expansions match as much as possible (greedy matching). 246 Example: 248 require ["fileinto", "regex", "variables"]; 250 if header :regex "List-ID" "<(.*)@" { 251 fileinto "lists.${1}"; stop; 252 } 254 # Imagine the header 255 # Subject: [acme-users] [fwd] version 1.0 is out 256 if header :regex "Subject" "^[(.*)] (.*)$" { 257 # ${1} will hold "acme-users] [fwd" 258 stop; 259 } 261 7.2. Set modifier :quoteregex 263 A sieve interpreter which supports both "regex" and "variables", MUST 264 support the optional tagged argument ":quoteregex" for use with the 265 "set" action. The ":quoteregex" modifier is subject to the same 266 rules and restrictions as the standard modifiers defined in [RFC5229] 267 section 4. 269 For convenience, the "MODIFIER" syntax element defined in [RFC5229] 270 is augmented here as follows: 272 MODIFIER =/ ":quoteregex" 274 This modifier adds the necessary quoting to ensure that the expanded 275 text will only match a literal occurrence if used as a parameter to 276 :regex. Every character with special meaning (".", "*", "?", etc.) 277 is prefixed with "\" in the expansion. This modifier has a 278 precedence value of 20 when used with other modifiers. 280 8. Examples 282 Example: 284 require "regex"; 286 # Try to catch unsolicited email. 287 if anyof ( 288 # if a message is not to me (with optional +detail), 289 not address :regex ["to", "cc", "bcc"] 290 "me(\\\\+.*)?@company\\\\.com", 292 # or the subject is all uppercase (no lowercase) 293 header :regex :comparator "i;octet" "subject" 294 "^[^[:lower:]]+$" ) { 296 discard; # junk it 297 } 299 9. IANA Considerations 301 The following template specifies the IANA registration of the "regex" 302 Sieve extension specified in this document: 304 To: iana@iana.org 305 Subject: Registration of new Sieve extension 307 Capability name: regex 308 Capability keyword: regex 309 Capability arguments: N/A 310 Standards Track/IESG-approved experimental RFC number: this RFC 311 Person and email address to contact for further information: 312 Kenneth Murchison 313 E-Mail: murch@andrew.cmu.edu 315 This information should be added to the list of Sieve extensions 316 given on http://www.iana.org/assignments/sieve-extensions. 318 10. Security Considerations 320 General Sieve security considerations are discussed in [RFC5228]. 321 All of the issues described there also apply to regular expression 322 matching. 324 It is easy to construct problematic regular expressions that are 325 computationally infeasible to evaluate. Execution of a Sieve that 326 employs a potentially problematic regular expression, such as 327 "(.*)*", may cause problems ranging from degradation of performance 328 to and outright denial of service. Moreover, determining the 329 computationl complexity associated with evaluating a given regular 330 expression is in general an intractable problem. 332 For this reason, all implementations MUST take appropriate steps to 333 limit the impact of runaway regular expression evaluation. 334 Implementations MAY restrict the regular expressions users are 335 allowed to specify. Implementations that do not impose such 336 restrictions SHOULD provide a means to abort evaluation of tests 337 using the :regex match type if the operation is taking too long. 339 11. Normative References 341 [IEEE.1003-2.1992] 342 Institute of Electrical and Electronics Engineers, 343 "Information Technology - Portable Operating System 344 Interface (POSIX) - Part 2: Shell and Utilities (Vol. 1)", 345 IEEE Standard 1003.2, 1992. 347 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 348 Requirement Levels", BCP 14, RFC 2119, March 1997. 350 [RFC5228] Guenther, P. and T. Showalter, "Sieve: An Email Filtering 351 Language", RFC 5228, January 2008. 353 [RFC5229] Homme, K., "Sieve Email Filtering: Variables Extension", 354 RFC 5229, January 2008. 356 Appendix A. Acknowledgments 358 Most of the text documenting the interaction with Sieve variables was 359 taken from an early draft of Kjetil Homme's Sieve variables 360 specification. 362 Thanks to Tim Showalter, Alexey Melnikov, Tony Hansen, Phil Pennock, 363 and Jutta Degener for their help with this document. 365 Authors' Addresses 367 Kenneth Murchison 368 Carnegie Mellon University 369 5000 Forbes Avenue 370 Cyert Hall 285 371 Pittsburgh, PA 15213 372 US 374 Phone: +1 412 268 2638 375 Email: murch@andrew.cmu.edu 377 Ned Freed 378 Oracle Corporation 379 800 Royal Oaks 380 Monrovia, CA 91016-6347 381 USA 383 Phone: +1 909 457 4293 384 Email: ned.freed@mrochek.com