idnits 2.17.1 draft-ietf-imapext-thread-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == It seems as if not all pages are separated by form feeds - found 0 form feeds but 13 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The abstract seems to contain references ([ABNF], [NEWS]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 83: '...onnected clients MUST use exactly this...' RFC 2119 keyword, line 227: '... MUST be kept consistent...' RFC 2119 keyword, line 455: '...plementations of THREAD MUST implement...' Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 2000) is 8624 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? 'ABNF' on line 500 looks like a reference -- Missing reference section? 'NEWS' on line 503 looks like a reference Summary: 7 errors (**), 0 flaws (~~), 2 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IMAP Extensions Working Group M. Crispin 3 Internet Draft: IMAP THREAD K. Murchison 4 September 2000 5 Document: internet-drafts/draft-ietf-imapext-thread-03.txt 7 INTERNET MESSAGE ACCESS PROTOCOL - THREAD EXTENSION 9 Status of this Memo 11 This document is an Internet-Draft and is in full conformance with 12 all provisions of Section 10 of RFC 2026. 14 Internet-Drafts are working documents of the Internet Engineering 15 Task Force (IETF), its areas, and its working groups. Note that 16 other groups may also distribute working documents as Internet- 17 Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet-Drafts as reference 22 material or to cite them other than as "work in progress." 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/ietf/1id-abstracts.txt 27 To view the list Internet-Draft Shadow Directories, see 28 http://www.ietf.org/shadow.html. 30 A revised version of this draft document will be submitted to the RFC 31 editor as a Proposed Standard for the Internet Community. 33 Discussion and suggestions for improvement are requested, and should 34 be sent to ietf-imapext@IMC.ORG. This document will expire before 20 35 March 2001. Distribution of this memo is unlimited. 37 Abstract 39 This document describes the server-based threading extension to the 40 IMAP4rev1 protocol. This extension provides substantial performance 41 improvements for IMAP clients which offer threaded views. 43 A server which supports this extension indicates this with one or 44 more capability names consisting of "THREAD=" followed by a supported 45 threading algorithm name as described in this document. This 46 provides for future upwards-compatible extensions. 48 Extracted Subject Text 50 Threading uses a version of the subject which has specific subject 51 artifacts of deployed Internet mail software removed. Due to the 52 complexity of these artifacts, the formal syntax for the subject 53 extraction rules is ambiguous. The following procedure is followed 54 to determine the actual "base subject" which is used to thread: 56 (1) Convert any RFC 2047 encoded-words in the subject to 57 UTF-8. Convert all tabs and continuations to space. 58 Convert all multiple spaces to a single space. 60 (2) Remove all trailing text of the subject that matches 61 the subj-trailer ABNF, repeat until no more matches are 62 possible. 64 (3) Remove all prefix text of the subject that matches the 65 subj-leader ABNF. 67 (4) If there is prefix text of the subject that matches the 68 subj-blob ABNF, and removing that prefix leaves a non-empty 69 subj-base, then remove the prefix text. 71 (5) Repeat (3) and (4) until no matches remain. 73 Note: it is possible to defer step (2) until step (6), but this 74 requires checking for subj-trailer in step (4). 76 (6) If the resulting text begins with the subj-fwd-hdr ABNF 77 and ends with the subj-fwd-trl ABNF, remove the 78 subj-fwd-hdr and subj-fwd-trl and repeat from step (2). 80 (7) The resulting text is the "base subject" used in 81 threading. 83 All servers and disconnected clients MUST use exactly this algorithm 84 when threading. Otherwise there is potential for a user to get 85 inconsistent results based on whether they are running in connected 86 or disconnected IMAP mode. 88 Additional Commands 90 This command is an extension to the IMAP4rev1 base protocol. 92 The section header is intended to correspond with where it would be 93 located in the main document if it was part of the base 94 specification. 96 6.3.THREAD. THREAD Command 98 Arguments: threading algorithm 99 charset specification 100 searching criteria (one or more) 102 Data: untagged responses: THREAD 104 Result: OK - thread completed 105 NO - thread error: can't thread that charset or 106 criteria 107 BAD - command unknown or arguments invalid 109 The THREAD command is a variant of SEARCH with threading semantics 110 for the results. Thread has two arguments before the searching 111 criteria argument; a threading algorithm, and the searching 112 charset. Note that unlike SEARCH, the searching charset argument 113 is mandatory. 115 There is also a UID THREAD command which corresponds to THREAD the 116 way that UID SEARCH corresponds to SEARCH. 118 The THREAD command first searches the mailbox for messages that 119 match the given searching criteria using the charset argument for 120 the interpretation of strings in the searching criteria. It then 121 returns the matching messages in an untagged THREAD response, 122 threaded according to the specified threading algorithm. 124 The defined threading algorithms are as follows: 126 ORDEREDSUBJECT 127 The ORDEREDSUBJECT threading algorithm is also referred to as 128 "poor man's threading." The searched messages are sorted by 129 subject and then by sent date, equivalent to a "SORT (SUBJECT 130 DATE)". The messages are then split into separate threads, 131 with each thread containing messages with the same extracted 132 subject text. Finally, the threads are sorted by the sent date 133 of the first message in the thread. 135 Note that each message in a thread is a child (as opposed to a 136 sibling) of the previous message. 138 REFERENCES 139 The REFERENCES threading algorithm is based on the algorithm 140 written by Jamie Zawinski which was used in "Netscape Mail and 141 News" versions 2.0 through 3.0. For details, see 142 http://www.jwz.org/docs/threading.html. 144 This algorithm threads the searched messages by grouping them 145 together in parent/child relationships based on which messages 146 are replies to others. The parent/child relationships are 147 built using two methods: reconstructing a message's ancestry 148 using the references contained within it; and checking the 149 subject of a message to see if it is a reply to (or forward of) 150 another. 152 The references used for reconstructing a message's ancestry are 153 found using the following rules: 155 If a message contains a [NEWS]-style References header line, 156 then use the Message IDs in the References header line as 157 the references. 159 If a message does not contain a References header line, or 160 the References header line does not contain any valid 161 Message IDs, then use the first (if any) valid Message ID 162 found in the In-Reply-To header line as the only reference 163 (parent) for this message. 165 NOTE: Although RFC 822 permits multiple Message IDs in 166 the In-Reply-To header, in actual practice this 167 discipline has not been followed. For example, In- 168 Reply-To headers have been observed with email addresses 169 after the Message ID, and there are no good heuristics 170 for software to determine the difference. This is not a 171 problem with the References header however. 173 If a message does not contain an In-Reply-To header line, or 174 the In-Reply-To header line does not contain a valid Message 175 ID, then the message does not have any references (NIL). 177 The REFERENCES algorithm is significantly more complex than 178 ORDEREDSUBJECT and consists of five main steps. These steps 179 are outlined in detail below. 181 (1) For each searched message: 183 (A) Using the Message IDs in the message's references, link 184 the corresponding messages (those whose Message-ID header 185 line contains the given reference Message ID) together as 186 parent/child. Make the first reference the parent of the 187 second (and the second a child of the first), the second the 188 parent of the third (and the third a child of the second), 189 etc. The following rules govern the creation of these 190 links: 192 If a message does not contain a Message-ID header line, 193 or the Message-ID header line does not contain a valid 194 Message ID, then assign a unique Message ID to this 195 message. 197 If two or more messages have the same Message ID, assign 198 a unique Message ID to each of the duplicates. 200 If no message can be found with a given Message ID, 201 create a dummy message with this ID. Use this dummy 202 message for all subsequent references to this ID. 204 If a message already has a parent, don't change the 205 existing link. This is done because the References 206 header line may have been truncated by a MUA. As a 207 result, there is no guarantee that the messages 208 corresponding to adjacent Message IDs in the References 209 header line are parent and child. 211 Do not create a parent/child link if creating that link 212 would introduce a loop. For example, before making 213 message A the parent of B, make sure that A is not a 214 descendent of B. 216 (B) Create a parent/child link between the last reference 217 (or NIL if there are no references) and the current message. 218 If the current message already has a parent, it is probably 219 the result of a truncated References header line, so break 220 the current parent/child link before creating the new 221 correct one. As in step 1.A, do not create the parent/child 222 link if creating that link would introduce a loop. Note 223 that if this message has no references, that it will now 224 have no parent. 226 NOTE: The parent/child links created in steps 1.A and 1.B 227 MUST be kept consistent with one another at ALL times. 229 (2) Gather together all of the messages that have no parents 230 and make them all children (siblings of one another) of a dummy 231 parent (the "root"). These messages constitute the first 232 (head) message of the threads created thus far. 234 (3) Prune dummy messages from the thread tree. Traverse each 235 thread under the root, and for each message: 237 If it is a dummy message with NO children, delete it. 239 If it is a dummy message with children, delete it, but 240 promote its children to the current level. In other words, 241 splice them in with the dummy's siblings. 243 Do not promote the children if doing so would make them 244 children of the root, unless there is only one child. 246 (4) Gather together messages under the root that have the same 247 extracted subject text. 249 (A) Create a table for associating extracted subjects with 250 messages. 252 (B) Populate the subject table with one message per 253 extracted subject. For each child of the root: 255 (i) Find the subject of this thread by extracting the 256 base subject from the current message, or its first child 257 if the current message is a dummy. 259 (ii) If the extracted subject is empty, skip this 260 message. 262 (iii) Lookup the message associated with this extracted 263 subject in the table. 265 (iv) If there is no message in the table with this 266 subject, add the current message and the extracted 267 subject to the subject table. 269 Otherwise, replace the message in the table with the 270 current message if the message in the table is not a 271 dummy AND either of the following criteria are true: 273 The current message is a dummy, OR 275 The message in the table is a reply or forward (its 276 original subject contains a subj-refwd part and/or a 277 "(fwd)" subj-trailer) and the current message is not. 279 (C) Merge threads with the same subject. For each child of 280 the root: 282 (i) Find the subject of this thread as in step 4.B.i 283 above. 285 (ii) If the extracted subject is empty, skip this 286 message. 288 (iii) Lookup the message associated with this extracted 289 subject in the table. 291 (iv) If the message in the table is the current message, 292 skip this message. 294 Otherwise, merge the current message with the one in the 295 table using the following rules: 297 If both messages are dummies, append the current 298 message's children to the children of the message in 299 the table (the children of both messages become 300 siblings), and then delete the current message. 302 If the message in the table is a dummy and the current 303 message is not, make the current message a child of 304 the message in the table (a sibling of it's children). 306 If the current message is a reply or forward and the 307 message in the table is not, make the current message 308 a child of the message in the table (a sibling of it's 309 children). 311 Otherwise, create a new dummy message and make both 312 the current message and the message in the table 313 children of the dummy. Then replace the message in 314 the table with the dummy message. 316 (5) Traverse the messages under the root and sort each set of 317 siblings by date. Traverse the messages in such a way that the 318 "youngest" set of siblings are sorted first, and the "oldest" 319 set of siblings are sorted last (grandchildren are sorted 320 before children, etc). In the case of an exact match on date, 321 use the order in which the messages appear in the mailbox (that 322 is, by sequence number) to determine the order. In the case of 323 a dummy message (which can only occur with top-level siblings), 324 use its first child for sorting. 326 Example: C: A283 THREAD ORDEREDSUBJECT UTF-8 SINCE 5-MAR-2000 327 S: * THREAD (166)(167)(168)(169)(172)(170)(171) 328 (173)(174 175 176 178 181 180)(179)(177 183 329 182 188 184 185 186 187 189)(190)(191)(192) 330 (193)(194 195)(196 197 198)(199)(200 202)(201) 331 (203)(204)(205)(206 207)(208) 332 S: A283 OK THREAD completed 333 C: A284 THREAD ORDEREDSUBJECT US-ASCII TEXT "gewp" 334 S: * THREAD 335 S: A284 OK THREAD completed 336 C: A285 THREAD REFERENCES UTF-8 SINCE 5-MAR-2000 337 S: * THREAD (166)(167)(168)(169)(172)((170)(179)) 338 (171)(173)((174)(175)(176)(178)(181)(180)) 339 ((177)(183)(182)(188 (184)(189))(185 186)(187)) 340 (190)(191)(192)(193)((194)(195 196))(197 198) 341 (199)(200 202)(201)(203)(204)(205 206 207)(208) 342 S: A285 OK THREAD completed 344 Note: The line breaks in the first and third client 345 responses are for editorial clarity and do not appear in 346 real THREAD responses. 348 Additional Responses 350 This response is an extension to the IMAP4rev1 base protocol. 352 The section heading of this response is intended to correspond with 353 where it would be located in the main document. 355 7.2.THREAD. THREAD Response 357 Data: zero or more threads 359 The THREAD response occurs as a result of a THREAD or UID THREAD 360 command. It contains zero or more threads. A thread consists of 361 a parenthesized list of thread members. 363 Thread members consist of zero or more message numbers, delimited 364 by spaces, indicating successive parent and child. This continues 365 until the thread splits into multiple sub-threads, at which point 366 the thread nests into multiple sub-threads with the first member 367 of each subthread being siblings at this level. There is no limit 368 to the nesting of threads. 370 The messages numbers refer to those messages that match the search 371 criteria. For THREAD, these are message sequence numbers; for UID 372 THREAD, these are unique identifiers. 374 Example: S: * THREAD (2)(3 6 (4 23)(44 7 96)) 376 The first thread consists only of message 2. The second thread 377 consists of the messages 3 (parent) and 6 (child), after which it 378 splits into two subthreads; the first of which contains messages 4 379 (child of 6, sibling of 44) and 23 (child of 4), and the second of 380 which contains messages 44 (child of 6, sibling of 4), 7 (child of 381 44), and 96 (child of 7). Since some later messages are parents 382 of earlier messages, the messages were probably moved from some 383 other mailbox at different times. 385 -- 2 387 -- 3 388 \-- 6 389 |-- 4 390 | \-- 23 391 | 392 \-- 44 393 \-- 7 394 \-- 96 396 Example: S: * THREAD ((3)(5)) 398 In this example, 3 and 5 are siblings of a parent which does not 399 match the search critieria (and/or does not exist in the mailbox); 400 however they are members of the same thread. 402 Formal Syntax of THREAD commands and Responses 404 thread-data = "THREAD" [SP 1*thread-list] 406 thread-list = "(" thread-members / thread-nested ")" 408 thread-members = nz-number *(SP nz-number) [SP thread-nested] 410 thread-nested = 2*thread-list 412 thread = ["UID" SP] "THREAD" SP thread-algorthm 413 SP search-charset 1*(SP search-key) 415 thread-algorithm = "ORDEREDSUBJECT" / "REFERENCES" / atom 417 The following syntax describes subject extraction rules (2)-(6): 419 subject = *subj-leader [subj-middle] *subj-trailer 421 subj-refwd = ("re" / ("fw" ["d"])) *WSP [subj-blob] ":" 423 subj-blob = "[" *BLOBCHAR "]" *WSP 425 subj-fwd = subj-fwd-hdr subject subj-fwd-trl 427 subj-fwd-hdr = "[fwd:" 429 subj-fwd-trl = "]" 431 subj-leader = (*subj-blob subj-refwd) / WSP 433 subj-middle = *subj-blob (subj-base / subj-fwd) 434 ; last subj-blob is subj-base if subj-base would 435 ; otherwise be empty 437 subj-trailer = "(fwd)" / WSP 439 subj-base = NONWSP *([*WSP] NONWSP) 440 ; can be a subj-blob 442 BLOBCHAR = %x01-5a / %x5c / %x5e-7f 443 ; any CHAR except '[' and ']' 445 NONWSP = %x01-08 / %x0a-1f / %x21-7f 446 ; any CHAR other than WSP 448 Security Considerations 450 Security issues are not discussed in this memo. 452 Internationalization Considerations 454 By default, strings are threaded according to the "minimum sorting 455 collation algorithm". All implementations of THREAD MUST implement 456 the minimum sorting collation algorithm. 458 In the minimum sorting collation algorithm, the Basic Latin 459 alphabetics (U+0041 to U+005A uppercase, U+0061 to U+007A lowercase) 460 are sorted in a case-insensitive fashion; that is, "A" (U+0041) and 461 "a" (U+0061) are treated as exact equals. The characters U+005B to 462 U+0060 are sorted after the Basic Latin alphabetics; for example, 463 U+005E is sorted after U+005A and U+007A. All other characters are 464 sorted according to their octet values, as expressed in UTF-8. No 465 attempt is made to treat composed characters specially, or to do 466 case-insensitive comparisons of composed characters. 468 Note: this means, among other things, that the composed 469 characters in the Latin-1 Supplement are not compared in 470 what would be considered an ISO 8859-1 "case-insensitive" 471 fashion. Case comparison rules for characters with 472 diacriticals differ between languages; the minimum sorting 473 collation does not attempt to deal with this at all. This 474 is reserved for other sorting collations, which may be 475 language-specific. 477 Other sorting collations, and the ability to change the sorting 478 collation, will be defined in a separate document dealing with IMAP 479 internationalization. 481 It is anticipated that there will be a generic Unicode sorting 482 collation, which will provide generic case-insensitivity for 483 alphabetic scripts, specification of composed character handling, and 484 language-specific sorting collations. A server which implements 485 non-default sorting collations will modify its sorting behavior 486 according to the selected sorting collation. 488 Non-English translations of "Re" or "Fw"/"Fwd" are not specified for 489 removal in the extracted subject text process. By specifying that 490 only the English forms of the prefixes are used, it becomes a simple 491 display time task to localize the prefix language for the user. If, 492 on the other hand, prefixes in multiple languages are permitted, the 493 result is a geometrically complex, and ultimately unimplementable, 494 task. In order to improve the ability to support non-English display 495 in Internet mail clients, only the English form of these prefixes 496 should be transmitted in Internet mail messages. 498 A. References 500 [ABNF] Crocker, D., and Overell, P. "Augmented BNF for Syntax 501 Specifications: ABNF", RFC 2234, November 1997. 503 [NEWS] Horton, M., and Adams, R., "Standard for interchange of USENET 504 messages", RFC-1036, AT&T Bell Laboratories and Center for Seismic 505 Studies, December, 1987. 506 Author's Address 508 Mark R. Crispin 509 Networks and Distributed Computing 510 University of Washington 511 4545 15th Avenue NE 512 Seattle, WA 98105-4527 514 Phone: (206) 543-5762 516 EMail: MRC@CAC.Washington.EDU 518 Kenneth Murchison 519 Oceana Matrix Ltd. 520 21 Princeton Place 521 Orchard Park, NY 14127 523 Phone: (716) 662-8973 x26 525 EMail: ken@oceana.com