idnits 2.17.1 draft-ietf-imapext-thread-11.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == It seems as if not all pages are separated by form feeds - found 0 form feeds but 14 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The abstract seems to contain references ([RFC-2822], [ABNF], [NEWS], [RFC2822]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 81: '...onnected clients MUST use exactly this...' RFC 2119 keyword, line 166: '...eading algorithm MUST normalize any ms...' RFC 2119 keyword, line 174: '... MUST be interpreted as bei...' RFC 2119 keyword, line 258: '... MUST be kept consistent...' RFC 2119 keyword, line 501: '...plementations of THREAD MUST implement...' Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 2002) is 7979 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? 'RFC 2822' on line 189 looks like a reference -- Missing reference section? 'ABNF' on line 554 looks like a reference -- Missing reference section? 'NEWS' on line 557 looks like a reference -- Missing reference section? 'RFC-2822' on line 561 looks like a reference Summary: 7 errors (**), 0 flaws (~~), 2 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IMAP Extensions Working Group M. Crispin 3 Internet Draft: IMAP THREAD K. Murchison 4 Document: internet-drafts/draft-ietf-imapext-thread-11.txt June 2002 6 INTERNET MESSAGE ACCESS PROTOCOL - THREAD EXTENSION 8 Status of this Memo 10 This document is an Internet-Draft and is in full conformance with 11 all provisions of Section 10 of RFC 2026. 13 Internet-Drafts are working documents of the Internet Engineering 14 Task Force (IETF), its areas, and its working groups. Note that 15 other groups may also distribute working documents as Internet- 16 Drafts. 18 Internet-Drafts are draft documents valid for a maximum of six months 19 and may be updated, replaced, or obsoleted by other documents at any 20 time. It is inappropriate to use Internet-Drafts as reference 21 material or to cite them other than as "work in progress." 23 The list of current Internet-Drafts can be accessed at 24 http://www.ietf.org/ietf/1id-abstracts.txt 26 To view the list Internet-Draft Shadow Directories, see 27 http://www.ietf.org/shadow.html. 29 A revised version of this draft document will be submitted to the RFC 30 editor as a Proposed Standard for the Internet Community. Discussion 31 and suggestions for improvement are requested, and should be sent to 32 ietf-imapext@IMC.ORG. This document will expire before 22 December 33 2002. Distribution of this memo is unlimited. 35 Abstract 37 This document describes the server-based threading extension to the 38 IMAP4rev1 protocol. This extension provides substantial performance 39 improvements for IMAP clients which offer threaded views. 41 A server which supports this extension indicates this with one or 42 more capability names consisting of "THREAD=" followed by a supported 43 threading algorithm name as described in this document. This 44 provides for future upwards-compatible extensions. 46 Base Subject Text 48 Threading uses the "base subject," which has specific subject 49 artifacts of deployed Internet mail software removed. Due to the 50 complexity of these artifacts, the formal syntax for the subject 51 extraction rules is ambiguous. The following procedure is followed 52 to determine the actual "base subject" which is used to thread: 54 (1) Convert any RFC 2047 encoded-words in the subject to 55 UTF-8. Convert all tabs and continuations to space. 56 Convert all multiple spaces to a single space. 58 (2) Remove all trailing text of the subject that matches 59 the subj-trailer ABNF, repeat until no more matches are 60 possible. 62 (3) Remove all prefix text of the subject that matches the 63 subj-leader ABNF. 65 (4) If there is prefix text of the subject that matches the 66 subj-blob ABNF, and removing that prefix leaves a non-empty 67 subj-base, then remove the prefix text. 69 (5) Repeat (3) and (4) until no matches remain. 71 Note: it is possible to defer step (2) until step (6), 72 but this requires checking for subj-trailer in step (4). 74 (6) If the resulting text begins with the subj-fwd-hdr ABNF 75 and ends with the subj-fwd-trl ABNF, remove the 76 subj-fwd-hdr and subj-fwd-trl and repeat from step (2). 78 (7) The resulting text is the "base subject" used in 79 threading. 81 All servers and disconnected clients MUST use exactly this algorithm 82 when threading. Otherwise there is potential for a user to get 83 inconsistent results based on whether they are running in connected 84 or disconnected IMAP mode. 86 Sent Date 88 As used in this document, the term "sent date" refers to the date and 89 time from the Date: header, adjusted by time zone. This differs from 90 date-related criteria in SEARCH, which use just the date and not the 91 time, nor adjusts by time zone. 93 Additional Commands 95 This command is an extension to the IMAP4rev1 base protocol. 97 The section header is intended to correspond with where it would be 98 located in the main document if it was part of the base 99 specification. 101 6.3.THREAD. THREAD Command 103 Arguments: threading algorithm 104 charset specification 105 searching criteria (one or more) 107 Data: untagged responses: THREAD 109 Result: OK - thread completed 110 NO - thread error: can't thread that charset or 111 criteria 112 BAD - command unknown or arguments invalid 114 The THREAD command is a variant of SEARCH with threading semantics 115 for the results. Thread has two arguments before the searching 116 criteria argument; a threading algorithm, and the searching 117 charset. Note that unlike SEARCH, the searching charset argument 118 is mandatory. 120 There is also a UID THREAD command which corresponds to THREAD the 121 way that UID SEARCH corresponds to SEARCH. 123 The THREAD command first searches the mailbox for messages that 124 match the given searching criteria using the charset argument for 125 the interpretation of strings in the searching criteria. It then 126 returns the matching messages in an untagged THREAD response, 127 threaded according to the specified threading algorithm. 129 Sorting is in ascending order. Earlier dates sort before later 130 dates; smaller sizes sort before larger sizes; and strings are 131 sorted according to ascending values established by their 132 collation algorithm (see under "Internationalization 133 Considerations"). 135 The defined threading algorithms are as follows: 137 ORDEREDSUBJECT 138 The ORDEREDSUBJECT threading algorithm is also referred to as 139 "poor man's threading." The searched messages are sorted by 140 base subject and then by the sent date. The messages are then 141 split into separate threads, with each thread containing 142 messages with the same base subject text. Finally, the threads 143 are sorted by the sent date of the first message in the thread. 145 Note that each message in a thread is a child (as opposed to a 146 sibling) of the previous message. 148 REFERENCES 149 The REFERENCES threading algorithm is based on the algorithm 150 written by Jamie Zawinski which was used in "Netscape Mail and 151 News" versions 2.0 through 3.0. For details, see 152 http://www.jwz.org/doc/threading.html. 154 This algorithm threads the searched messages by grouping them 155 together in parent/child relationships based on which messages 156 are replies to others. The parent/child relationships are 157 built using two methods: reconstructing a message's ancestry 158 using the references contained within it; and checking the 159 original (not base) subject of a message to see if it is a 160 reply to (or forward of) another message. 162 Note: "Message ID" in the following description refers to a 163 normalized form of the msg-id in [RFC 2822]. The actual 164 text in an RFC 2822 may use quoting, resulting in multiple 165 ways of expressing the same Message ID. Implementations of 166 the REFERENCES threading algorithm MUST normalize any msg-id 167 in order to avoid false non-matches due to differences in 168 quoting. 170 For example, the msg-id 171 <"01KF8JCEOCBS0045PS"@xxx.yyy.com> 172 and the msg-id 173 <01KF8JCEOCBS0045PS@xxx.yyy.com> 174 MUST be interpreted as being the same Message ID. 176 The references used for reconstructing a message's ancestry are 177 found using the following rules: 179 If a message contains a [NEWS]-style References header line, 180 then use the Message IDs in the References header line as 181 the references. 183 If a message does not contain a References header line, or 184 the References header line does not contain any valid 185 Message IDs, then use the first (if any) valid Message ID 186 found in the In-Reply-To header line as the only reference 187 (parent) for this message. 189 Note: Although [RFC 2822] permits multiple Message IDs in 190 the In-Reply-To header, in actual practice this 191 discipline has not been followed. For example, 192 In-Reply-To headers have been observed with email 193 addresses after the Message ID, and there are no good 194 heuristics for software to determine the difference. 195 This is not a problem with the References header however. 197 If a message does not contain an In-Reply-To header line, or 198 the In-Reply-To header line does not contain a valid Message 199 ID, then the message does not have any references (NIL). 201 A message is considered to be a reply or forward if the base 202 subject extraction rules, applied to the original subject, 203 remove any of the following: a subj-refwd, a "(fwd)" 204 subj-trailer, or a subj-fwd-hdr and subj-fwd-trl. 206 The REFERENCES algorithm is significantly more complex than 207 ORDEREDSUBJECT and consists of six main steps. These steps are 208 outlined in detail below. 210 (1) For each searched message: 212 (A) Using the Message IDs in the message's references, link 213 the corresponding messages (those whose Message-ID header 214 line contains the given reference Message ID) together as 215 parent/child. Make the first reference the parent of the 216 second (and the second a child of the first), the second the 217 parent of the third (and the third a child of the second), 218 etc. The following rules govern the creation of these 219 links: 221 If a message does not contain a Message-ID header line, 222 or the Message-ID header line does not contain a valid 223 Message ID, then assign a unique Message ID to this 224 message. 226 If two or more messages have the same Message ID, assign 227 a unique Message ID to each of the duplicates. 229 If no message can be found with a given Message ID, 230 create a dummy message with this ID. Use this dummy 231 message for all subsequent references to this ID. 233 If a message already has a parent, don't change the 234 existing link. This is done because the References 235 header line may have been truncated by a MUA. As a 236 result, there is no guarantee that the messages 237 corresponding to adjacent Message IDs in the References 238 header line are parent and child. 240 Do not create a parent/child link if creating that link 241 would introduce a loop. For example, before making 242 message A the parent of B, make sure that A is not a 243 descendent of B. 245 Note: Message ID comparisons are case-sensitive. 247 (B) Create a parent/child link between the last reference 248 (or NIL if there are no references) and the current message. 249 If the current message already has a parent, it is probably 250 the result of a truncated References header line, so break 251 the current parent/child link before creating the new 252 correct one. As in step 1.A, do not create the parent/child 253 link if creating that link would introduce a loop. Note 254 that if this message has no references, that it will now 255 have no parent. 257 Note: The parent/child links created in steps 1.A and 1.B 258 MUST be kept consistent with one another at ALL times. 260 (2) Gather together all of the messages that have no parents 261 and make them all children (siblings of one another) of a dummy 262 parent (the "root"). These messages constitute the first 263 (head) message of the threads created thus far. 265 (3) Prune dummy messages from the thread tree. Traverse each 266 thread under the root, and for each message: 268 If it is a dummy message with NO children, delete it. 270 If it is a dummy message with children, delete it, but 271 promote its children to the current level. In other words, 272 splice them in with the dummy's siblings. 274 Do not promote the children if doing so would make them 275 children of the root, unless there is only one child. 277 (4) Sort the messages under the root (top-level siblings only) 278 by sent date. In the case of an exact match on sent date or if 279 either of the Date: headers used in a comparison can not be 280 parsed, use the order in which the messages appear in the 281 mailbox (that is, by sequence number) to determine the order. 282 In the case of a dummy message, sort its children by sent date 283 and then use the first child for the top-level sort. 285 (5) Gather together messages under the root that have the same 286 base subject text. 288 (A) Create a table for associating base subjects with 289 messages, called the subject table. 291 (B) Populate the subject table with one message per each 292 base subject. For each child of the root: 294 (i) Find the subject of this thread, by using the base 295 subject from either the current message or its first 296 child if the current message is a dummy. This is the 297 thread subject. 299 (ii) If the thread subject is empty, skip this message. 301 (iii) Look up the message associated with the thread 302 subject in the subject table. 304 (iv) If there is no message in the subject table with the 305 thread subject, add the current message and the thread 306 subject to the subject table. 308 Otherwise, if the message in the subject table is not a 309 dummy, AND either of the following criteria are true: 311 The current message is a dummy, OR 313 The message in the subject table is a reply or forward 314 and the current message is not. 316 then replace the message in the subject table with the 317 current message. 319 (C) Merge threads with the same thread subject. For each 320 child of the root: 322 (i) Find the message's thread subject as in step 5.B.i 323 above. 325 (ii) If the thread subject is empty, skip this message. 327 (iii) Lookup the message associated with this thread 328 subject in the subject table. 330 (iv) If the message in the subject table is the current 331 message, skip this message. 333 Otherwise, merge the current message with the one in the 334 subject table using the following rules: 336 If both messages are dummies, append the current 337 message's children to the children of the message in 338 the subject table (the children of both messages 339 become siblings), and then delete the current message. 341 If the message in the subject table is a dummy and the 342 current message is not, make the current message a 343 child of the message in the subject table (a sibling 344 of its children). 346 If the current message is a reply or forward and the 347 message in the subject table is not, make the current 348 message a child of the message in the subject table (a 349 sibling of its children). 351 Otherwise, create a new dummy message and make both 352 the current message and the message in the subject 353 table children of the dummy. Then replace the message 354 in the subject table with the dummy message. 356 Note: Subject comparisons are case-insensitive, as 357 described under "Internationalization 358 Considerations." 360 (6) Traverse the messages under the root and sort each set of 361 siblings by sent date. Traverse the messages in such a way 362 that the "youngest" set of siblings are sorted first, and the 363 "oldest" set of siblings are sorted last (grandchildren are 364 sorted before children, etc). In the case of an exact match on 365 sent date or if either of the Date: headers used in a 366 comparison can not be parsed, use the order in which the 367 messages appear in the mailbox (that is, by sequence number) to 368 determine the order. In the case of a dummy message (which can 369 only occur with top-level siblings), use its first child for 370 sorting. 372 Example: C: A283 THREAD ORDEREDSUBJECT UTF-8 SINCE 5-MAR-2000 373 S: * THREAD (166)(167)(168)(169)(172)(170)(171) 374 (173)(174 175 176 178 181 180)(179)(177 183 375 182 188 184 185 186 187 189)(190)(191)(192) 376 (193)(194 195)(196 197 198)(199)(200 202)(201) 377 (203)(204)(205)(206 207)(208) 378 S: A283 OK THREAD completed 379 C: A284 THREAD ORDEREDSUBJECT US-ASCII TEXT "gewp" 380 S: * THREAD 381 S: A284 OK THREAD completed 382 C: A285 THREAD REFERENCES UTF-8 SINCE 5-MAR-2000 383 S: * THREAD (166)(167)(168)(169)(172)((170)(179)) 384 (171)(173)((174)(175)(176)(178)(181)(180)) 385 ((177)(183)(182)(188 (184)(189))(185 186)(187)) 386 (190)(191)(192)(193)((194)(195 196))(197 198) 387 (199)(200 202)(201)(203)(204)(205 206 207)(208) 388 S: A285 OK THREAD completed 390 Note: The line breaks in the first and third client 391 responses are for editorial clarity and do not appear in 392 real THREAD responses. 394 Additional Responses 396 This response is an extension to the IMAP4rev1 base protocol. 398 The section heading of this response is intended to correspond with 399 where it would be located in the main document. 401 7.2.THREAD. THREAD Response 403 Data: zero or more threads 405 The THREAD response occurs as a result of a THREAD or UID THREAD 406 command. It contains zero or more threads. A thread consists of 407 a parenthesized list of thread members. 409 Thread members consist of zero or more message numbers, delimited 410 by spaces, indicating successive parent and child. This continues 411 until the thread splits into multiple sub-threads, at which point 412 the thread nests into multiple sub-threads with the first member 413 of each subthread being siblings at this level. There is no limit 414 to the nesting of threads. 416 The messages numbers refer to those messages that match the search 417 criteria. For THREAD, these are message sequence numbers; for UID 418 THREAD, these are unique identifiers. 420 Example: S: * THREAD (2)(3 6 (4 23)(44 7 96)) 422 The first thread consists only of message 2. The second thread 423 consists of the messages 3 (parent) and 6 (child), after which it 424 splits into two subthreads; the first of which contains messages 4 425 (child of 6, sibling of 44) and 23 (child of 4), and the second of 426 which contains messages 44 (child of 6, sibling of 4), 7 (child of 427 44), and 96 (child of 7). Since some later messages are parents 428 of earlier messages, the messages were probably moved from some 429 other mailbox at different times. 431 -- 2 433 -- 3 434 \-- 6 435 |-- 4 436 | \-- 23 437 | 438 \-- 44 439 \-- 7 440 \-- 96 442 Example: S: * THREAD ((3)(5)) 444 In this example, 3 and 5 are siblings of a parent which does not 445 match the search criteria (and/or does not exist in the mailbox); 446 however they are members of the same thread. 448 Formal Syntax of THREAD commands and Responses 450 thread-data = "THREAD" [SP 1*thread-list] 452 thread-list = "(" thread-members / thread-nested ")" 454 thread-members = nz-number *(SP nz-number) [SP thread-nested] 456 thread-nested = 2*thread-list 458 thread = ["UID" SP] "THREAD" SP thread-algorithm 459 SP search-charset 1*(SP search-key) 461 thread-algorithm = "ORDEREDSUBJECT" / "REFERENCES" / atom 463 The following syntax describes base subject extraction rules (2)-(6): 465 subject = *subj-leader [subj-middle] *subj-trailer 467 subj-refwd = ("re" / ("fw" ["d"])) *WSP [subj-blob] ":" 469 subj-blob = "[" *BLOBCHAR "]" *WSP 471 subj-fwd = subj-fwd-hdr subject subj-fwd-trl 473 subj-fwd-hdr = "[fwd:" 475 subj-fwd-trl = "]" 477 subj-leader = (*subj-blob subj-refwd) / WSP 479 subj-middle = *subj-blob (subj-base / subj-fwd) 480 ; last subj-blob is subj-base if subj-base would 481 ; otherwise be empty 483 subj-trailer = "(fwd)" / WSP 485 subj-base = NONWSP *([*WSP] NONWSP) 486 ; can be a subj-blob 488 BLOBCHAR = %x01-5a / %x5c / %x5e-7f 489 ; any CHAR except '[' and ']' 491 NONWSP = %x01-08 / %x0a-1f / %x21-7f 492 ; any CHAR other than WSP 494 Security Considerations 496 Security issues are not discussed in this memo. 498 Internationalization Considerations 500 By default, strings are threaded according to the "minimum sorting 501 collation algorithm". All implementations of THREAD MUST implement 502 the minimum sorting collation algorithm. 504 In the minimum sorting collation algorithm, the Basic Latin 505 alphabetics (U+0041 to U+005A uppercase, U+0061 to U+007A lowercase) 506 are sorted in a case-insensitive fashion; that is, "A" (U+0041) and 507 "a" (U+0061) are treated as exact equals. The characters U+005B to 508 U+0060 are sorted after the Basic Latin alphabetics; for example, 509 U+005E is sorted after U+005A and U+007A. All other characters are 510 sorted according to their octet values, as expressed in UTF-8. No 511 attempt is made to treat composed characters specially, or to do 512 case-insensitive comparisons of composed characters. 514 Note: this means, among other things, that the composed 515 characters in the Latin-1 Supplement are not compared in 516 what would be considered an ISO 8859-1 "case-insensitive" 517 fashion. Case comparison rules for characters with 518 diacriticals differ between languages; the minimum sorting 519 collation does not attempt to deal with this at all. This 520 is reserved for other sorting collations, which may be 521 language-specific. 523 ;;; *** ITEM FOR DISCUSSION *** 524 ;;; THERE IS SOME CONCERN THAT THIS MINIMUM COLLATION IS TOO MINIMAL, 525 ;;; AND THAT THE "GENERIC UNICODE SORTING COLLATION" DISCUSSED BELOW 526 ;;; NEEDS TO BE THE MINIMUM. ONE SUGGESTION IS UNICODE TECHNICAL 527 ;;; STANDARD 10 (TR-10). IF THIS IS THE MINIMUM, THAT REQUIRES THAT 528 ;;; ALL IMPLEMENTATIONS OF SORT AND THREAD BE UNICODE-SAVVY AT LEAST 529 ;;; TO THE POINT OF IMPLEMENTATION TR-10. IS THIS REALISTIC? DOES 530 ;;; THIS RAISE EXCESSIVE IMPLEMENTATION BARRIERS? 531 Other sorting collations, and the ability to change the sorting 532 collation, will be defined in a separate document dealing with IMAP 533 internationalization. 535 It is anticipated that there will be a generic Unicode sorting 536 collation, which will provide generic case-insensitivity for 537 alphabetic scripts, specification of composed character handling, and 538 language-specific sorting collations. A server which implements 539 non-default sorting collations will modify its sorting behavior 540 according to the selected sorting collation. 542 Non-English translations of "Re" or "Fw"/"Fwd" are not specified for 543 removal in the base subject extraction process. By specifying that 544 only the English forms of the prefixes are used, it becomes a simple 545 display time task to localize the prefix language for the user. If, 546 on the other hand, prefixes in multiple languages are permitted, the 547 result is a geometrically complex, and ultimately unimplementable, 548 task. In order to improve the ability to support non-English display 549 in Internet mail clients, only the English form of these prefixes 550 should be transmitted in Internet mail messages. 552 A. References 554 [ABNF] Crocker, D., and Overell, P. "Augmented BNF for Syntax 555 Specifications: ABNF", RFC 2234, November 1997. 557 [NEWS] Horton, M., and Adams, R., "Standard for interchange of USENET 558 messages", RFC-1036, AT&T Bell Laboratories and Center for Seismic 559 Studies, December, 1987. 561 [RFC-2822] Resnick, P. "Internet Message Format", RFC 2822, April 562 2001. 564 Author's Address 566 Mark R. Crispin 567 Networks and Distributed Computing 568 University of Washington 569 4545 15th Avenue NE 570 Seattle, WA 98105-4527 572 Phone: (206) 543-5762 574 EMail: MRC@CAC.Washington.EDU 576 Kenneth Murchison 577 Oceana Matrix Ltd. 578 21 Princeton Place 579 Orchard Park, NY 14127 581 Phone: (716) 662-8973 x26 583 EMail: ken@oceana.com