idnits 2.17.1 draft-ietf-imapext-thread-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == It seems as if not all pages are separated by form feeds - found 0 form feeds but 13 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The abstract seems to contain references ([ABNF], [NEWS]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 82: '...onnected clients MUST use exactly this...' RFC 2119 keyword, line 234: '... MUST be kept consistent...' RFC 2119 keyword, line 476: '...plementations of THREAD MUST implement...' Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 2000) is 8531 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? 'ABNF' on line 521 looks like a reference -- Missing reference section? 'NEWS' on line 524 looks like a reference Summary: 7 errors (**), 0 flaws (~~), 2 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IMAP Extensions Working Group M. Crispin 3 Internet Draft: IMAP THREAD K. Murchison 4 Document: internet-drafts/draft-ietf-imapext-thread-06.txt December 2000 6 INTERNET MESSAGE ACCESS PROTOCOL - THREAD EXTENSION 8 Status of this Memo 10 This document is an Internet-Draft and is in full conformance with 11 all provisions of Section 10 of RFC 2026. 13 Internet-Drafts are working documents of the Internet Engineering 14 Task Force (IETF), its areas, and its working groups. Note that 15 other groups may also distribute working documents as Internet- 16 Drafts. 18 Internet-Drafts are draft documents valid for a maximum of six months 19 and may be updated, replaced, or obsoleted by other documents at any 20 time. It is inappropriate to use Internet-Drafts as reference 21 material or to cite them other than as "work in progress." 23 The list of current Internet-Drafts can be accessed at 24 http://www.ietf.org/ietf/1id-abstracts.txt 26 To view the list Internet-Draft Shadow Directories, see 27 http://www.ietf.org/shadow.html. 29 A revised version of this draft document will be submitted to the RFC 30 editor as a Proposed Standard for the Internet Community. 32 Discussion and suggestions for improvement are requested, and should 33 be sent to ietf-imapext@IMC.ORG. This document will expire before 27 34 June 2001. Distribution of this memo is unlimited. 36 Abstract 38 This document describes the server-based threading extension to the 39 IMAP4rev1 protocol. This extension provides substantial performance 40 improvements for IMAP clients which offer threaded views. 42 A server which supports this extension indicates this with one or 43 more capability names consisting of "THREAD=" followed by a supported 44 threading algorithm name as described in this document. This 45 provides for future upwards-compatible extensions. 47 Extracted Subject Text 49 Threading uses a version of the subject which has specific subject 50 artifacts of deployed Internet mail software removed. Due to the 51 complexity of these artifacts, the formal syntax for the subject 52 extraction rules is ambiguous. The following procedure is followed 53 to determine the actual "base subject" which is used to thread: 55 (1) Convert any RFC 2047 encoded-words in the subject to 56 UTF-8. Convert all tabs and continuations to space. 57 Convert all multiple spaces to a single space. 59 (2) Remove all trailing text of the subject that matches 60 the subj-trailer ABNF, repeat until no more matches are 61 possible. 63 (3) Remove all prefix text of the subject that matches the 64 subj-leader ABNF. 66 (4) If there is prefix text of the subject that matches the 67 subj-blob ABNF, and removing that prefix leaves a non-empty 68 subj-base, then remove the prefix text. 70 (5) Repeat (3) and (4) until no matches remain. 72 Note: it is possible to defer step (2) until step (6), 73 but this requires checking for subj-trailer in step (4). 75 (6) If the resulting text begins with the subj-fwd-hdr ABNF 76 and ends with the subj-fwd-trl ABNF, remove the 77 subj-fwd-hdr and subj-fwd-trl and repeat from step (2). 79 (7) The resulting text is the "base subject" used in 80 threading. 82 All servers and disconnected clients MUST use exactly this algorithm 83 when threading. Otherwise there is potential for a user to get 84 inconsistent results based on whether they are running in connected 85 or disconnected IMAP mode. 87 Sent Date 89 As used in this document, the term "sent date" refers to the date and 90 time from the Date: header, adjusted by time zone. This differs from 91 date-related criteria in SEARCH, which use just the date and not the 92 time, nor adjusts by time zone. 94 Additional Commands 96 This command is an extension to the IMAP4rev1 base protocol. 98 The section header is intended to correspond with where it would be 99 located in the main document if it was part of the base 100 specification. 102 6.3.THREAD. THREAD Command 104 Arguments: threading algorithm 105 charset specification 106 searching criteria (one or more) 108 Data: untagged responses: THREAD 110 Result: OK - thread completed 111 NO - thread error: can't thread that charset or 112 criteria 113 BAD - command unknown or arguments invalid 115 The THREAD command is a variant of SEARCH with threading semantics 116 for the results. Thread has two arguments before the searching 117 criteria argument; a threading algorithm, and the searching 118 charset. Note that unlike SEARCH, the searching charset argument 119 is mandatory. 121 There is also a UID THREAD command which corresponds to THREAD the 122 way that UID SEARCH corresponds to SEARCH. 124 The THREAD command first searches the mailbox for messages that 125 match the given searching criteria using the charset argument for 126 the interpretation of strings in the searching criteria. It then 127 returns the matching messages in an untagged THREAD response, 128 threaded according to the specified threading algorithm. 130 The defined threading algorithms are as follows: 132 ORDEREDSUBJECT 133 The ORDEREDSUBJECT threading algorithm is also referred to as 134 "poor man's threading." The searched messages are sorted by 135 subject and then by the sent date. The messages are then split 136 into separate threads, with each thread containing messages 137 with the same extracted subject text. Finally, the threads are 138 sorted by the sent date of the first message in the thread. 140 Note that each message in a thread is a child (as opposed to a 141 sibling) of the previous message. 143 REFERENCES 144 The REFERENCES threading algorithm is based on the algorithm 145 written by Jamie Zawinski which was used in "Netscape Mail and 146 News" versions 2.0 through 3.0. For details, see 147 http://www.jwz.org/doc/threading.html. 149 This algorithm threads the searched messages by grouping them 150 together in parent/child relationships based on which messages 151 are replies to others. The parent/child relationships are 152 built using two methods: reconstructing a message's ancestry 153 using the references contained within it; and checking the 154 subject of a message to see if it is a reply to (or forward of) 155 another. 157 The references used for reconstructing a message's ancestry are 158 found using the following rules: 160 If a message contains a [NEWS]-style References header line, 161 then use the Message IDs in the References header line as 162 the references. 164 If a message does not contain a References header line, or 165 the References header line does not contain any valid 166 Message IDs, then use the first (if any) valid Message ID 167 found in the In-Reply-To header line as the only reference 168 (parent) for this message. 170 Note: Although RFC 822 permits multiple Message IDs in 171 the In-Reply-To header, in actual practice this 172 discipline has not been followed. For example, 173 In-Reply-To headers have been observed with email 174 addresses after the Message ID, and there are no good 175 heuristics for software to determine the difference. 176 This is not a problem with the References header however. 178 If a message does not contain an In-Reply-To header line, or 179 the In-Reply-To header line does not contain a valid Message 180 ID, then the message does not have any references (NIL). 182 The REFERENCES algorithm is significantly more complex than 183 ORDEREDSUBJECT and consists of six main steps. These steps are 184 outlined in detail below. 186 (1) For each searched message: 188 (A) Using the Message IDs in the message's references, link 189 the corresponding messages (those whose Message-ID header 190 line contains the given reference Message ID) together as 191 parent/child. Make the first reference the parent of the 192 second (and the second a child of the first), the second the 193 parent of the third (and the third a child of the second), 194 etc. The following rules govern the creation of these 195 links: 197 If a message does not contain a Message-ID header line, 198 or the Message-ID header line does not contain a valid 199 Message ID, then assign a unique Message ID to this 200 message. 202 If two or more messages have the same Message ID, assign 203 a unique Message ID to each of the duplicates. 205 If no message can be found with a given Message ID, 206 create a dummy message with this ID. Use this dummy 207 message for all subsequent references to this ID. 209 If a message already has a parent, don't change the 210 existing link. This is done because the References 211 header line may have been truncated by a MUA. As a 212 result, there is no guarantee that the messages 213 corresponding to adjacent Message IDs in the References 214 header line are parent and child. 216 Do not create a parent/child link if creating that link 217 would introduce a loop. For example, before making 218 message A the parent of B, make sure that A is not a 219 descendent of B. 221 Note: Message ID comparisons are case-sensitive. 223 (B) Create a parent/child link between the last reference 224 (or NIL if there are no references) and the current message. 225 If the current message already has a parent, it is probably 226 the result of a truncated References header line, so break 227 the current parent/child link before creating the new 228 correct one. As in step 1.A, do not create the parent/child 229 link if creating that link would introduce a loop. Note 230 that if this message has no references, that it will now 231 have no parent. 233 Note: The parent/child links created in steps 1.A and 1.B 234 MUST be kept consistent with one another at ALL times. 236 (2) Gather together all of the messages that have no parents 237 and make them all children (siblings of one another) of a dummy 238 parent (the "root"). These messages constitute the first 239 (head) message of the threads created thus far. 241 (3) Prune dummy messages from the thread tree. Traverse each 242 thread under the root, and for each message: 244 If it is a dummy message with NO children, delete it. 246 If it is a dummy message with children, delete it, but 247 promote its children to the current level. In other words, 248 splice them in with the dummy's siblings. 250 Do not promote the children if doing so would make them 251 children of the root, unless there is only one child. 253 (4) Sort the messages under the root (top-level siblings only) 254 by sent date. In the case of an exact match on sent date or if 255 either of the Date: headers used in a comparison can not be 256 parsed, use the order in which the messages appear in the 257 mailbox (that is, by sequence number) to determine the order. 258 In the case of a dummy message, sort its children by sent date 259 and then use the first child for the top-level sort. 261 (5) Gather together messages under the root that have the same 262 extracted subject text. 264 (A) Create a table for associating extracted subjects with 265 messages. 267 (B) Populate the subject table with one message per 268 extracted subject. For each child of the root: 270 (i) Find the subject of this thread by extracting the 271 base subject from the current message, or its first child 272 if the current message is a dummy. 274 (ii) If the extracted subject is empty, skip this 275 message. 277 (iii) Lookup the message associated with this extracted 278 subject in the table. 280 (iv) If there is no message in the table with this 281 subject, add the current message and the extracted 282 subject to the subject table. 284 Otherwise, replace the message in the table with the 285 current message if the message in the table is not a 286 dummy AND either of the following criteria are true: 288 The current message is a dummy, OR 290 The message in the table is a reply or forward (its 291 original subject contains a subj-refwd part and/or a 292 "(fwd)" subj-trailer) and the current message is not. 294 (C) Merge threads with the same subject. For each child of 295 the root: 297 (i) Find the subject of this thread as in step 4.B.i 298 above. 300 (ii) If the extracted subject is empty, skip this 301 message. 303 (iii) Lookup the message associated with this extracted 304 subject in the table. 306 (iv) If the message in the table is the current message, 307 skip this message. 309 Otherwise, merge the current message with the one in the 310 table using the following rules: 312 If both messages are dummies, append the current 313 message's children to the children of the message in 314 the table (the children of both messages become 315 siblings), and then delete the current message. 317 If the message in the table is a dummy and the current 318 message is not, make the current message a child of 319 the message in the table (a sibling of it's children). 321 If the current message is a reply or forward and the 322 message in the table is not, make the current message 323 a child of the message in the table (a sibling of it's 324 children). 326 Otherwise, create a new dummy message and make both 327 the current message and the message in the table 328 children of the dummy. Then replace the message in 329 the table with the dummy message. 331 Note: Subject comparisons are case-insensitive, as 332 described under "Internationalization 333 Considerations." 335 (6) Traverse the messages under the root and sort each set of 336 siblings by sent date. Traverse the messages in such a way 337 that the "youngest" set of siblings are sorted first, and the 338 "oldest" set of siblings are sorted last (grandchildren are 339 sorted before children, etc). In the case of an exact match on 340 sent date or if either of the Date: headers used in a 341 comparison can not be parsed, use the order in which the 342 messages appear in the mailbox (that is, by sequence number) to 343 determine the order. In the case of a dummy message (which can 344 only occur with top-level siblings), use its first child for 345 sorting. 347 Example: C: A283 THREAD ORDEREDSUBJECT UTF-8 SINCE 5-MAR-2000 348 S: * THREAD (166)(167)(168)(169)(172)(170)(171) 349 (173)(174 175 176 178 181 180)(179)(177 183 350 182 188 184 185 186 187 189)(190)(191)(192) 351 (193)(194 195)(196 197 198)(199)(200 202)(201) 352 (203)(204)(205)(206 207)(208) 353 S: A283 OK THREAD completed 354 C: A284 THREAD ORDEREDSUBJECT US-ASCII TEXT "gewp" 355 S: * THREAD 356 S: A284 OK THREAD completed 357 C: A285 THREAD REFERENCES UTF-8 SINCE 5-MAR-2000 358 S: * THREAD (166)(167)(168)(169)(172)((170)(179)) 359 (171)(173)((174)(175)(176)(178)(181)(180)) 360 ((177)(183)(182)(188 (184)(189))(185 186)(187)) 361 (190)(191)(192)(193)((194)(195 196))(197 198) 362 (199)(200 202)(201)(203)(204)(205 206 207)(208) 363 S: A285 OK THREAD completed 365 Note: The line breaks in the first and third client 366 responses are for editorial clarity and do not appear in 367 real THREAD responses. 369 Additional Responses 371 This response is an extension to the IMAP4rev1 base protocol. 373 The section heading of this response is intended to correspond with 374 where it would be located in the main document. 376 7.2.THREAD. THREAD Response 378 Data: zero or more threads 380 The THREAD response occurs as a result of a THREAD or UID THREAD 381 command. It contains zero or more threads. A thread consists of 382 a parenthesized list of thread members. 384 Thread members consist of zero or more message numbers, delimited 385 by spaces, indicating successive parent and child. This continues 386 until the thread splits into multiple sub-threads, at which point 387 the thread nests into multiple sub-threads with the first member 388 of each subthread being siblings at this level. There is no limit 389 to the nesting of threads. 391 The messages numbers refer to those messages that match the search 392 criteria. For THREAD, these are message sequence numbers; for UID 393 THREAD, these are unique identifiers. 395 Example: S: * THREAD (2)(3 6 (4 23)(44 7 96)) 397 The first thread consists only of message 2. The second thread 398 consists of the messages 3 (parent) and 6 (child), after which it 399 splits into two subthreads; the first of which contains messages 4 400 (child of 6, sibling of 44) and 23 (child of 4), and the second of 401 which contains messages 44 (child of 6, sibling of 4), 7 (child of 402 44), and 96 (child of 7). Since some later messages are parents 403 of earlier messages, the messages were probably moved from some 404 other mailbox at different times. 406 -- 2 408 -- 3 409 \-- 6 410 |-- 4 411 | \-- 23 412 | 413 \-- 44 414 \-- 7 415 \-- 96 417 Example: S: * THREAD ((3)(5)) 419 In this example, 3 and 5 are siblings of a parent which does not 420 match the search criteria (and/or does not exist in the mailbox); 421 however they are members of the same thread. 423 Formal Syntax of THREAD commands and Responses 425 thread-data = "THREAD" [SP 1*thread-list] 427 thread-list = "(" thread-members / thread-nested ")" 429 thread-members = nz-number *(SP nz-number) [SP thread-nested] 431 thread-nested = 2*thread-list 433 thread = ["UID" SP] "THREAD" SP thread-algorithm 434 SP search-charset 1*(SP search-key) 436 thread-algorithm = "ORDEREDSUBJECT" / "REFERENCES" / atom 438 The following syntax describes subject extraction rules (2)-(6): 440 subject = *subj-leader [subj-middle] *subj-trailer 442 subj-refwd = ("re" / ("fw" ["d"])) *WSP [subj-blob] ":" 444 subj-blob = "[" *BLOBCHAR "]" *WSP 446 subj-fwd = subj-fwd-hdr subject subj-fwd-trl 448 subj-fwd-hdr = "[fwd:" 450 subj-fwd-trl = "]" 452 subj-leader = (*subj-blob subj-refwd) / WSP 454 subj-middle = *subj-blob (subj-base / subj-fwd) 455 ; last subj-blob is subj-base if subj-base would 456 ; otherwise be empty 458 subj-trailer = "(fwd)" / WSP 460 subj-base = NONWSP *([*WSP] NONWSP) 461 ; can be a subj-blob 463 BLOBCHAR = %x01-5a / %x5c / %x5e-7f 464 ; any CHAR except '[' and ']' 466 NONWSP = %x01-08 / %x0a-1f / %x21-7f 467 ; any CHAR other than WSP 469 Security Considerations 471 Security issues are not discussed in this memo. 473 Internationalization Considerations 475 By default, strings are threaded according to the "minimum sorting 476 collation algorithm". All implementations of THREAD MUST implement 477 the minimum sorting collation algorithm. 479 In the minimum sorting collation algorithm, the Basic Latin 480 alphabetics (U+0041 to U+005A uppercase, U+0061 to U+007A lowercase) 481 are sorted in a case-insensitive fashion; that is, "A" (U+0041) and 482 "a" (U+0061) are treated as exact equals. The characters U+005B to 483 U+0060 are sorted after the Basic Latin alphabetics; for example, 484 U+005E is sorted after U+005A and U+007A. All other characters are 485 sorted according to their octet values, as expressed in UTF-8. No 486 attempt is made to treat composed characters specially, or to do 487 case-insensitive comparisons of composed characters. 489 Note: this means, among other things, that the composed 490 characters in the Latin-1 Supplement are not compared in 491 what would be considered an ISO 8859-1 "case-insensitive" 492 fashion. Case comparison rules for characters with 493 diacriticals differ between languages; the minimum sorting 494 collation does not attempt to deal with this at all. This 495 is reserved for other sorting collations, which may be 496 language-specific. 498 Other sorting collations, and the ability to change the sorting 499 collation, will be defined in a separate document dealing with IMAP 500 internationalization. 502 It is anticipated that there will be a generic Unicode sorting 503 collation, which will provide generic case-insensitivity for 504 alphabetic scripts, specification of composed character handling, and 505 language-specific sorting collations. A server which implements 506 non-default sorting collations will modify its sorting behavior 507 according to the selected sorting collation. 509 Non-English translations of "Re" or "Fw"/"Fwd" are not specified for 510 removal in the extracted subject text process. By specifying that 511 only the English forms of the prefixes are used, it becomes a simple 512 display time task to localize the prefix language for the user. If, 513 on the other hand, prefixes in multiple languages are permitted, the 514 result is a geometrically complex, and ultimately unimplementable, 515 task. In order to improve the ability to support non-English display 516 in Internet mail clients, only the English form of these prefixes 517 should be transmitted in Internet mail messages. 519 A. References 521 [ABNF] Crocker, D., and Overell, P. "Augmented BNF for Syntax 522 Specifications: ABNF", RFC 2234, November 1997. 524 [NEWS] Horton, M., and Adams, R., "Standard for interchange of USENET 525 messages", RFC-1036, AT&T Bell Laboratories and Center for Seismic 526 Studies, December, 1987. 528 Author's Address 530 Mark R. Crispin 531 Networks and Distributed Computing 532 University of Washington 533 4545 15th Avenue NE 534 Seattle, WA 98105-4527 536 Phone: (206) 543-5762 538 EMail: MRC@CAC.Washington.EDU 540 Kenneth Murchison 541 Oceana Matrix Ltd. 542 21 Princeton Place 543 Orchard Park, NY 14127 545 Phone: (716) 662-8973 x26 547 EMail: ken@oceana.com