idnits 2.17.1 draft-ietf-imapext-sort-17.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3667, Section 5.1 on line 822. -- Found old boilerplate from RFC 3978, Section 5.5 on line 858. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 869. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 876. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 882. ** Found boilerplate matching RFC 3979, Section 5, paragraph 1 (on line 869), which is fine, but *also* found old RFC 2026, Section 10.4A text on line 838. ** Found boilerplate matching RFC 3979, Section 5, paragraph 3 (on line 882), which is fine, but *also* found old RFC 2026, Section 10.4B text on line 844. ** The document claims conformance with section 10 of RFC 2026, but uses some RFC 3978/3979 boilerplate. As RFC 3978/3979 replaces section 10 of RFC 2026, you should not claim conformance with it if you have changed to using RFC 3978/3979 boilerplate. ** The document seems to lack an RFC 3978 Section 5.1 IPR Disclosure Acknowledgement -- however, there's a paragraph with a matching beginning. Boilerplate error? ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** The document seems to lack an RFC 3978 Section 5.4 Reference to BCP 78 -- however, there's a paragraph with a matching beginning. Boilerplate error? ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. ** The document uses RFC 3667 boilerplate or RFC 3978-like boilerplate instead of verbatim RFC 3978 boilerplate. After 6 May 2005, submission of drafts without verbatim RFC 3978 boilerplate is not accepted. The following non-3978 patterns matched text found in the document. That text should be removed or replaced: By submitting this Internet-Draft, I certify that any applicable patent or other IPR claims of which I am aware have been disclosed, or will be disclosed, and any of which I become aware will be disclosed, in accordance with RFC 3668. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 917 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 92 instances of too long lines in the document, the longest one being 6 characters in excess of 72. ** The abstract seems to contain references ([IMAP]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? RFC 2119 keyword, line 55: '...e SORT extension SHOULD also implement...' RFC 2119 keyword, line 117: '... MUST use exactly this algorithm wh...' RFC 2119 keyword, line 128: '...he date and time SHOULD be treated as ...' RFC 2119 keyword, line 129: '...nvalid, the time SHOULD be treated as ...' RFC 2119 keyword, line 130: '...the date and time SHOULD be treated as...' (9 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == In addition to RFC 3979, Section 5, paragraph 1 boilerplate, a section with a similar start was also found: The IETF takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; neither does it represent that it has made any effort to identify any such rights. Information on the IETF's procedures with respect to rights in standards-track and standards-related documentation can be found in BCP-11. Copies of claims of rights made available for publication and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementors or users of this specification can be obtained from the IETF Secretariat. == In addition to RFC 3979, Section 5, paragraph 3 boilerplate, a section with a similar start was also found: The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights which may cover technology that may be required to practice this standard. Please address the information to the IETF Executive Director. == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 2004) is 7283 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? 'IMAP' on line 773 looks like a reference -- Missing reference section? 'IMAP-MODELS' on line 790 looks like a reference -- Missing reference section? 'IMAP-I18N' on line 776 looks like a reference -- Missing reference section? 'KEYWORDS' on line 779 looks like a reference -- Missing reference section? 'ABNF' on line 762 looks like a reference -- Missing reference section? 'IMAP-MODEL' on line 116 looks like a reference -- Missing reference section? 'CHARSET' on line 766 looks like a reference -- Missing reference section? 'THREADING' on line 793 looks like a reference -- Missing reference section? 'RFC 2822' on line 368 looks like a reference -- Missing reference section? 'COMPARATOR' on line 770 looks like a reference -- Missing reference section? 'RFC-2822' on line 783 looks like a reference Summary: 15 errors (**), 0 flaws (~~), 5 warnings (==), 18 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 IMAP Extensions Working Group M. Crispin 2 INTERNET-DRAFT: IMAP SORT K. Murchison 3 Document: internet-drafts/draft-ietf-imapext-sort-17.txt May 2004 5 INTERNET MESSAGE ACCESS PROTOCOL - SORT AND THREAD EXTENSIONS 7 Status of this Memo 9 This document is an Internet-Draft and is in full conformance with 10 all provisions of Section 10 of RFC 2026. 12 Internet-Drafts are working documents of the Internet Engineering 13 Task Force (IETF), its areas, and its working groups. Note that 14 other groups may also distribute working documents as 15 Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six months 18 and may be updated, replaced, or obsoleted by other documents at any 19 time. It is inappropriate to use Internet-Drafts as reference 20 material or to cite them other than as "work in progress." 22 The list of current Internet-Drafts can be accessed at 23 http://www.ietf.org/ietf/1id-abstracts.txt 25 To view the list Internet-Draft Shadow Directories, see 26 http://www.ietf.org/shadow.html. 28 A revised version of this document will be submitted to the RFC 29 editor as an Informational Document for the Internet Community. 31 A revised version of this draft document will be submitted to the RFC 32 editor as a Proposed Standard for the Internet Community. Discussion 33 and suggestions for improvement are requested, and should be sent to 34 ietf-imapext@IMC.ORG. This document will expire before 23 November 2004. 35 Distribution of this memo is unlimited. 37 Abstract 39 This document describes the base-level server-based sorting and 40 threading extensions to the [IMAP] protocol. These extensions 41 provide substantial performance improvements for IMAP clients which 42 offer sorted and threaded views. 44 1. Introduction 46 The SORT and THREAD extensions to the [IMAP] protocol provide a means 47 of server-based sorting and threading of messages, without requiring 48 that the client download the necessary data to do so itself. This is 49 particularly useful for online clients as described in [IMAP-MODELS]. 51 A server which supports the base-level SORT extension indicates this 52 with a capability name which starts with "SORT". Future, 53 upwards-compatible extensions to the SORT extension will all start 54 with "SORT", indicating support for this base level. A server which 55 implements the SORT extension SHOULD also implement the COMPARATOR 56 extension as described in [IMAP-I18N]. 58 A server which supports the THREAD extension indicates this with one 59 or more capability names consisting of "THREAD=" followed by a 60 supported threading algorithm name as described in this document. 61 This provides for future upwards-compatible extensions. 63 2. Terminology 65 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 66 "SHOULD", "SHOULD NOT", "MAY", and "OPTIONAL" in this document are to 67 be interpreted as described in [KEYWORDS]. 69 The word "can" (not "may") is used to refer to a possible 70 circumstance or situation, as opposed to an optional facility of the 71 protocol. 73 "User" is used to refer to a human user, whereas "client" refers to 74 the software being run by the user. 76 In examples, "C:" and "S:" indicate lines sent by the client and 77 server respectively. 79 2.1 Base Subject 81 Subject sorting and threading use the "base subject," which has 82 specific subject artifacts removed. Due to the complexity of these 83 artifacts, the formal syntax for the subject extraction rules is 84 ambiguous. The following procedure is followed to determine the 85 actual "base subject" which is used to sort by subject, using the 86 [ABNF] formal syntax rules described in section 5: 88 (1) Convert any RFC 2047 encoded-words in the subject to 89 UTF-8 as described in "internationalization 90 considerations." Convert all tabs and continuations to 91 space. Convert all multiple spaces to a single space. 93 (2) Remove all trailing text of the subject that matches 94 the subj-trailer ABNF, repeat until no more matches are 95 possible. 97 (3) Remove all prefix text of the subject that matches the 98 subj-leader ABNF. 100 (4) If there is prefix text of the subject that matches the 101 subj-blob ABNF, and removing that prefix leaves a non-empty 102 subj-base, then remove the prefix text. 104 (5) Repeat (3) and (4) until no matches remain. 106 Note: it is possible to defer step (2) until step (6), but this 107 requires checking for subj-trailer in step (4). 109 (6) If the resulting text begins with the subj-fwd-hdr ABNF 110 and ends with the subj-fwd-trl ABNF, remove the 111 subj-fwd-hdr and subj-fwd-trl and repeat from step (2). 113 (7) The resulting text is the "base subject" used in the 114 SORT. 116 All servers and disconnected (as described in [IMAP-MODEL]) clients 117 MUST use exactly this algorithm when sorting by subject. Otherwise 118 there is potential for a user to get inconsistent results based on 119 whether they are running in connected or disconnected mode. 121 2.2 Sent Date 123 As used in this document, the term "sent date" refers to the date and 124 time from the Date: header, adjusted by time zone to normalize to UTC. 125 For example, "31 Dec 2000 16:01:33 -0800" is equivalent to the UTC 126 date and time of "1 Jan 2001 00:01:33 +0000". 128 If the time zone is invalid, the date and time SHOULD be treated as UTC. 129 If the time is also invalid, the time SHOULD be treated as 00:00:00. If 130 there is no valid date or time, the date and time SHOULD be treated as 131 00:00:00 on the earliest possible date. 133 This differs from the date-related criteria in the SEARCH command 134 (described in [IMAP] section 6.4.4), which use just the date and not 135 the time, and are not adjusted by time zone. 137 3. Additional Commands 139 These commands are extension to the [IMAP] base protocol. 141 The section headings are intended to correspond with where they would 142 be located in the main document if they were part of the base 143 specification. 145 BASE.6.4.SORT. SORT Command 147 Arguments: sort program 148 charset specification 149 searching criteria (one or more) 151 Data: untagged responses: SORT 153 Result: OK - sort completed 154 NO - sort error: can't sort that charset or 155 criteria 156 BAD - command unknown or arguments invalid 158 The SORT command is a variant of SEARCH with sorting semantics for 159 the results. Sort has two arguments before the searching criteria 160 argument; a parenthesized list of sort criteria, and the searching 161 charset. 163 The charset argument is mandatory (unlike SEARCH) and indicates 164 the [CHARSET] of the strings that appear in the searching criteria. 165 The US-ASCII and UTF-8 charsets MUST be implemented. All other 166 charsets are optional. 168 There is also a UID SORT command which returns unique identifiers 169 instead of message sequence numbers. Note that there are separate 170 searching criteria for message sequence numbers and UIDs; thus the 171 arguments to UID SORT are interpreted the same as in SORT. This is 172 analogous to the behavior of UID SEARCH, as opposed to UID COPY, UID 173 FETCH, or UID STORE. 175 The SORT command first searches the mailbox for messages that 176 match the given searching criteria using the charset argument for 177 the interpretation of strings in the searching criteria. It then 178 returns the matching messages in an untagged SORT response, sorted 179 according to one or more sort criteria. 181 Sorting is in ascending order. Earlier dates sort before later 182 dates; smaller sizes sort before larger sizes; and strings are 183 sorted according to ascending values established by their 184 collation algorithm (see under "Internationalization 185 Considerations"). 187 If two or more messages exactly match according to the sorting 188 criteria, these messages are sorted according to the order in 189 which they appear in the mailbox. In other words, there is an 190 implicit sort criterion of "sequence number". 192 When multiple sort criteria are specified, the result is sorted in 193 the priority order that the criteria appear. For example, 194 (SUBJECT DATE) will sort messages in order by their base subject 195 text; and for messages with the same base subject text will sort 196 by their sent date. 198 Untagged EXPUNGE responses are not permitted while the server is 199 responding to a SORT command, but are permitted during a UID SORT 200 command. 202 The defined sort criteria are as follows. Refer to the Formal 203 Syntax section for the precise syntactic definitions of the 204 arguments. If the associated RFC-822 header for a particular 205 criterion is absent, it is treated as the empty string. The empty 206 string always collates before non-empty strings. 208 ARRIVAL 209 Internal date and time of the message. This differs from the 210 ON criteria in SEARCH, which uses just the internal date. 212 CC 213 RFC-822 local-part of the first "cc" address. 215 DATE 216 Sent date and time from the Date: header, adjusted by time 217 zone. This differs from the SENTON criteria in SEARCH, which 218 uses just the date and not the time, nor adjusts by time zone. 220 FROM 221 RFC-822 local-part of the first "From" address. 223 REVERSE 224 Followed by another sort criterion, has the effect of that 225 criterion but in reverse (descending) order. 226 Note: REVERSE only reverses a single criterion, and does not 227 affect the implicit "sequence number" sort criterion if all 228 other criteria are identicial. Consequently, a sort of 229 REVERSE SUBJECT is not the same as a reverse ordering of a 230 SUBJECT sort. This can be avoided by use of additional 231 criteria, e.g. SUBJECT DATE vs. REVERSE SUBJECT REVERSE 232 DATE. In general, however, it's better (and faster, if the 233 client has a "reverse current ordering" command) to reverse 234 the results in the client instead of issuing a new SORT. 236 SIZE 237 Size of the message in octets. 239 SUBJECT 240 Base subject text. 242 TO 243 RFC-822 local-part of the first "To" address. 245 Example: C: A282 SORT (SUBJECT) UTF-8 SINCE 1-Feb-1994 246 S: * SORT 2 84 882 247 S: A282 OK SORT completed 248 C: A283 SORT (SUBJECT REVERSE DATE) UTF-8 ALL 249 S: * SORT 5 3 4 1 2 250 S: A283 OK SORT completed 251 C: A284 SORT (SUBJECT) US-ASCII TEXT "not in mailbox" 252 S: * SORT 253 S: A284 OK SORT completed 255 BASE.6.4.THREAD. THREAD Command 257 Arguments: threading algorithm 258 charset specification 259 searching criteria (one or more) 261 Data: untagged responses: THREAD 263 Result: OK - thread completed 264 NO - thread error: can't thread that charset or 265 criteria 266 BAD - command unknown or arguments invalid 268 The THREAD command is a variant of SEARCH with threading semantics 269 for the results. Thread has two arguments before the searching 270 criteria argument; a threading algorithm, and the searching 271 charset. 273 The charset argument is mandatory (unlike SEARCH) and indicates 274 the [CHARSET] of the strings that appear in the searching criteria. 275 The US-ASCII and UTF-8 charsets MUST be implemented. All other 276 charsets are optional. 278 There is also a UID THREAD command which returns unique identifiers 279 instead of message sequence numbers. Note that there are separate 280 searching criteria for message sequence numbers and UIDs; thus the 281 arguments to UID THREAD are interpreted the same as in THREAD. This is 282 analogous to the behavior of UID SEARCH, as opposed to UID COPY, UID 283 FETCH, or UID STORE. 285 The THREAD command first searches the mailbox for messages that 286 match the given searching criteria using the charset argument for 287 the interpretation of strings in the searching criteria. It then 288 returns the matching messages in an untagged THREAD response, 289 threaded according to the specified threading algorithm. 291 All collation is in ascending order. Earlier dates collate before 292 later dates and strings are collated according to ascending values 293 established by their collation algorithm (see under 294 "Internationalization Considerations"). 296 The defined threading algorithms are as follows: 298 ORDEREDSUBJECT 300 The ORDEREDSUBJECT threading algorithm is also referred to as 301 "poor man's threading." The searched messages are sorted by 302 base subject and then by the sent date. The messages are then 303 split into separate threads, with each thread containing 304 messages with the same base subject text. Finally, the threads 305 are sorted by the sent date of the first message in the thread. 307 The first message of each thread are siblings of each other 308 (the "root"). The second message of a thread is the child of 309 the first message, and subsequent messages of the thread are 310 siblings of the second message and hence children of the 311 message at the root. Hence, there are no grandchildren in 312 ORDEREDSUBJECT threading. 314 Note: early drafts of this specification specified 315 that each message in an ORDEREDSUBJECT thread is a child 316 (as opposed to a sibling) of the previous message. This 317 is now deprecated. For compatibility with servers which 318 may still use the old definition, client implementations 319 SHOULD treat descendents of a child as being siblings of 320 that child. 322 This is because the old definition mistakenly indicated 323 that there was a parent/child relationship between 324 successive messages in a thread; when in fact there was 325 only a chronological relationship. In clients which 326 indicate parent/child relationships in a thread tree, 327 this would indicate levels of descent which did not 328 exist. 330 REFERENCES 332 The REFERENCES threading algorithm is based on the [THREADING] 333 algorithm written used in "Netscape Mail and News" versions 2.0 334 through 3.0. This algorithm threads the searched messages by 335 grouping them together in parent/child relationships based on 336 which messages are replies to others. The parent/child 337 relationships are built using two methods: reconstructing a 338 message's ancestry using the references contained within it; 339 and checking the original (not base) subject of a message to 340 see if it is a reply to (or forward of) another message. 342 Note: "Message ID" in the following description refers to a 343 normalized form of the msg-id in [RFC 2822]. The actual 344 text in an RFC 2822 may use quoting, resulting in multiple 345 ways of expressing the same Message ID. Implementations of 346 the REFERENCES threading algorithm MUST normalize any msg-id 347 in order to avoid false non-matches due to differences in 348 quoting. 350 For example, the msg-id 351 <"01KF8JCEOCBS0045PS"@xxx.yyy.com> 352 and the msg-id 353 <01KF8JCEOCBS0045PS@xxx.yyy.com> 354 MUST be interpreted as being the same Message ID. 356 The references used for reconstructing a message's ancestry are 357 found using the following rules: 359 If a message contains a References header line, then use the 360 Message IDs in the References header line as the references. 362 If a message does not contain a References header line, or 363 the References header line does not contain any valid 364 Message IDs, then use the first (if any) valid Message ID 365 found in the In-Reply-To header line as the only reference 366 (parent) for this message. 368 Note: Although [RFC 2822] permits multiple Message IDs in 369 the In-Reply-To header, in actual practice this 370 discipline has not been followed. For example, 371 In-Reply-To headers have been observed with message 372 addresses after the Message ID, and there are no good 373 heuristics for software to determine the difference. 374 This is not a problem with the References header however. 376 If a message does not contain an In-Reply-To header line, or 377 the In-Reply-To header line does not contain a valid Message 378 ID, then the message does not have any references (NIL). 380 A message is considered to be a reply or forward if the base 381 subject extraction rules, applied to the original subject, 382 remove any of the following: a subj-refwd, a "(fwd)" 383 subj-trailer, or a subj-fwd-hdr and subj-fwd-trl. 385 The REFERENCES algorithm is significantly more complex than 386 ORDEREDSUBJECT and consists of six main steps. These steps are 387 outlined in detail below. 389 (1) For each searched message: 391 (A) Using the Message IDs in the message's references, link 392 the corresponding messages (those whose Message-ID header 393 line contains the given reference Message ID) together as 394 parent/child. Make the first reference the parent of the 395 second (and the second a child of the first), the second the 396 parent of the third (and the third a child of the second), 397 etc. The following rules govern the creation of these 398 links: 400 If a message does not contain a Message-ID header line, 401 or the Message-ID header line does not contain a valid 402 Message ID, then assign a unique Message ID to this 403 message. 405 If two or more messages have the same Message ID, then 406 only use that Message ID in the first (lowest sequence 407 number) message, and assign a unique Message ID to each 408 of the subsequent messages with a duplicate of that 409 Message ID. 411 If no message can be found with a given Message ID, 412 create a dummy message with this ID. Use this dummy 413 message for all subsequent references to this ID. 415 If a message already has a parent, don't change the 416 existing link. This is done because the References 417 header line may have been truncated by a MUA. As a 418 result, there is no guarantee that the messages 419 corresponding to adjacent Message IDs in the References 420 header line are parent and child. 422 Do not create a parent/child link if creating that link 423 would introduce a loop. For example, before making 424 message A the parent of B, make sure that A is not a 425 descendent of B. 427 Note: Message ID comparisons are case-sensitive. 429 (B) Create a parent/child link between the last reference 430 (or NIL if there are no references) and the current message. 431 If the current message already has a parent, it is probably 432 the result of a truncated References header line, so break 433 the current parent/child link before creating the new 434 correct one. As in step 1.A, do not create the parent/child 435 link if creating that link would introduce a loop. Note 436 that if this message has no references, that it will now 437 have no parent. 439 Note: The parent/child links created in steps 1.A and 1.B 440 MUST be kept consistent with one another at ALL times. 442 (2) Gather together all of the messages that have no parents 443 and make them all children (siblings of one another) of a dummy 444 parent (the "root"). These messages constitute the first 445 (head) message of the threads created thus far. 447 (3) Prune dummy messages from the thread tree. Traverse each 448 thread under the root, and for each message: 450 If it is a dummy message with NO children, delete it. 452 If it is a dummy message with children, delete it, but 453 promote its children to the current level. In other words, 454 splice them in with the dummy's siblings. 456 Do not promote the children if doing so would make them 457 children of the root, unless there is only one child. 459 (4) Sort the messages under the root (top-level siblings only) 460 by sent date. In the case of an exact match on sent date or if 461 either of the Date: headers used in a comparison can not be 462 parsed, use the order in which the messages appear in the 463 mailbox (that is, by sequence number) to determine the order. 464 In the case of a dummy message, sort its children by sent date 465 and then use the first child for the top-level sort. 467 (5) Gather together messages under the root that have the same 468 base subject text. 470 (A) Create a table for associating base subjects with 471 messages, called the subject table. 473 (B) Populate the subject table with one message per each 474 base subject. For each child of the root: 476 (i) Find the subject of this thread, by using the base 477 subject from either the current message or its first 478 child if the current message is a dummy. This is the 479 thread subject. 481 (ii) If the thread subject is empty, skip this message. 483 (iii) Look up the message associated with the thread 484 subject in the subject table. 486 (iv) If there is no message in the subject table with the 487 thread subject, add the current message and the thread 488 subject to the subject table. 490 Otherwise, if the message in the subject table is not a 491 dummy, AND either of the following criteria are true: 493 The current message is a dummy, OR 495 The message in the subject table is a reply or forward 496 and the current message is not. 498 then replace the message in the subject table with the 499 current message. 501 (C) Merge threads with the same thread subject. For each 502 child of the root: 504 (i) Find the message's thread subject as in step 5.B.i 505 above. 507 (ii) If the thread subject is empty, skip this message. 509 (iii) Lookup the message associated with this thread 510 subject in the subject table. 512 (iv) If the message in the subject table is the current 513 message, skip this message. 515 Otherwise, merge the current message with the one in the 516 subject table using the following rules: 518 If both messages are dummies, append the current 519 message's children to the children of the message in 520 the subject table (the children of both messages 521 become siblings), and then delete the current message. 523 If the message in the subject table is a dummy and the 524 current message is not, make the current message a 525 child of the message in the subject table (a sibling 526 of its children). 528 If the current message is a reply or forward and the 529 message in the subject table is not, make the current 530 message a child of the message in the subject table (a 531 sibling of its children). 533 Otherwise, create a new dummy message and make both 534 the current message and the message in the subject 535 table children of the dummy. Then replace the message 536 in the subject table with the dummy message. 538 Note: Subject comparisons are case-insensitive, as 539 described under "Internationalization 540 Considerations." 542 (6) Traverse the messages under the root and sort each set of 543 siblings by sent date. Traverse the messages in such a way 544 that the "youngest" set of siblings are sorted first, and the 545 "oldest" set of siblings are sorted last (grandchildren are 546 sorted before children, etc). In the case of an exact match on 547 sent date or if either of the Date: headers used in a 548 comparison can not be parsed, use the order in which the 549 messages appear in the mailbox (that is, by sequence number) to 550 determine the order. In the case of a dummy message (which can 551 only occur with top-level siblings), use its first child for 552 sorting. 554 Example: C: A283 THREAD ORDEREDSUBJECT UTF-8 SINCE 5-MAR-2000 555 S: * THREAD (166)(167)(168)(169)(172)(170)(171) 556 (173)(174 (175)(176)(178)(181)(180))(179)(177 557 (183)(182)(188)(184)(185)(186)(187)(189))(190) 558 (191)(192)(193)(194 195)(196 (197)(198))(199) 559 (200 202)(201)(203)(204)(205)(206 207)(208) 560 S: A283 OK THREAD completed 561 C: A284 THREAD ORDEREDSUBJECT US-ASCII TEXT "gewp" 562 S: * THREAD 563 S: A284 OK THREAD completed 564 C: A285 THREAD REFERENCES UTF-8 SINCE 5-MAR-2000 565 S: * THREAD (166)(167)(168)(169)(172)((170)(179)) 566 (171)(173)((174)(175)(176)(178)(181)(180)) 567 ((177)(183)(182)(188 (184)(189))(185 186)(187)) 568 (190)(191)(192)(193)((194)(195 196))(197 198) 569 (199)(200 202)(201)(203)(204)(205 206 207)(208) 570 S: A285 OK THREAD completed 572 Note: The line breaks in the first and third client 573 responses are for editorial clarity and do not appear in 574 real THREAD responses. 576 4. Additional Responses 578 These responses are extensions to the [IMAP] base protocol. 580 The section headings of these responses are intended to correspond 581 with where they would be located in the main document. 583 BASE.7.2.SORT. SORT Response 585 Data: zero or more numbers 587 The SORT response occurs as a result of a SORT or UID SORT 588 command. The number(s) refer to those messages that match the 589 search criteria. For SORT, these are message sequence numbers; 590 for UID SORT, these are unique identifiers. Each number is 591 delimited by a space. 593 Example: S: * SORT 2 3 6 595 BASE.7.2.THREAD. THREAD Response 597 Data: zero or more threads 599 The THREAD response occurs as a result of a THREAD or UID THREAD 600 command. It contains zero or more threads. A thread consists of 601 a parenthesized list of thread members. 603 Thread members consist of zero or more message numbers, delimited 604 by spaces, indicating successive parent and child. This continues 605 until the thread splits into multiple sub-threads, at which point 606 the thread nests into multiple sub-threads with the first member 607 of each subthread being siblings at this level. There is no limit 608 to the nesting of threads. 610 The messages numbers refer to those messages that match the search 611 criteria. For THREAD, these are message sequence numbers; for UID 612 THREAD, these are unique identifiers. 614 Example: S: * THREAD (2)(3 6 (4 23)(44 7 96)) 616 The first thread consists only of message 2. The second thread 617 consists of the messages 3 (parent) and 6 (child), after which it 618 splits into two subthreads; the first of which contains messages 4 619 (child of 6, sibling of 44) and 23 (child of 4), and the second of 620 which contains messages 44 (child of 6, sibling of 4), 7 (child of 621 44), and 96 (child of 7). Since some later messages are parents 622 of earlier messages, the messages were probably moved from some 623 other mailbox at different times. 625 -- 2 627 -- 3 628 \-- 6 629 |-- 4 630 | \-- 23 631 | 632 \-- 44 633 \-- 7 634 \-- 96 636 Example: S: * THREAD ((3)(5)) 638 In this example, 3 and 5 are siblings of a parent which does not 639 match the search criteria (and/or does not exist in the mailbox); 640 however they are members of the same thread. 642 5. Formal Syntax of SORT and THREAD Commands and Responses 644 The following syntax specification uses the Augmented Backus-Naur 645 Form (ABNF) notation as specified in [ABNF]. It also uses [ABNF] 646 rules defined in [IMAP]. 648 sort = ["UID" SP] "SORT" SP sort-criteria SP search-criteria 650 sort-criteria = "(" sort-criterion *(SP sort-criterion) ")" 652 sort-criterion = ["REVERSE" SP] sort-key 654 sort-key = "ARRIVAL" / "CC" / "DATE" / "FROM" / "SIZE" / 655 "SUBJECT" / "TO" 657 thread = ["UID" SP] "THREAD" SP thread-alg SP search-criteria 659 thread-alg = "ORDEREDSUBJECT" / "REFERENCES" / thread-alg-ext 661 thread-alg-ext = atom 662 ; New algorithms MUST be registered with IANA 664 search-criteria = charset 1*(SP search-key) 666 charset = astring 667 ; CHARSET values MUST be registered with IANA 669 sort-data = "SORT" *(SP nz-number) 671 thread-data = "THREAD" [SP 1*thread-list] 673 thread-list = "(" (thread-members / thread-nested) ")" 675 thread-members = nz-number *(SP nz-number) [SP thread-nested] 677 thread-nested = 2*thread-list 679 The following syntax describes base subject extraction rules (2)-(6): 681 subject = *subj-leader [subj-middle] *subj-trailer 683 subj-refwd = ("re" / ("fw" ["d"])) *WSP [subj-blob] ":" 685 subj-blob = "[" *BLOBCHAR "]" *WSP 687 subj-fwd = subj-fwd-hdr subject subj-fwd-trl 689 subj-fwd-hdr = "[fwd:" 691 subj-fwd-trl = "]" 693 subj-leader = (*subj-blob subj-refwd) / WSP 695 subj-middle = *subj-blob (subj-base / subj-fwd) 696 ; last subj-blob is subj-base if subj-base would 697 ; otherwise be empty 699 subj-trailer = "(fwd)" / WSP 701 subj-base = NONWSP *(*WSP NONWSP) 702 ; can be a subj-blob 704 BLOBCHAR = %x01-5a / %x5c / %x5e-7f 705 ; any CHAR except '[' and ']' 707 NONWSP = %x01-08 / %x0a-1f / %x21-7f 708 ; any CHAR other than WSP 710 6. Security Considerations 712 The SORT and THREAD extensions do not raise any security 713 considerations that are not present in the base [IMAP] protocol, and 714 these issues are discussed in [IMAP]. Nevertheless, it is important 715 to remember that [IMAP] protocol transactions, including message 716 data, are sent in the clear over the network unless protection from 717 snooping is negotiated, either by the use of STARTTLS, privacy 718 protection is negotiated in the AUTHENTICATE command, or some other 719 protection mechanism is in effect. 721 7. Internationalization Considerations 723 As described in [IMAP-I18N], strings in charsets other than US-ASCII 724 and UTF-8 MUST be converted to UTF-8 and compared in ascending order 725 according to the selected or active collation algorithm. If the server 726 does not support the [IMAP-I18N] COMPARATOR extension, the collation 727 algorithm used is the "en;ascii-casemap" collation described in 728 [COMPARATOR]. 730 Translations of the "re" or "fw"/"fwd" tokens are not specified for 731 removal in the base subject extraction process. An attempt to add such 732 translated tokens would result in a geometrically complex, and 733 ultimately unimplementable, task. 735 Instead, note that [RFC-2822] section 3.6.5 recommends that "re:" (from 736 the Latin "res", in the matter of) be used to identify a reply. 737 Although it is evident that, from the multiple forms of token to 738 identify a forwarded message, there is considerable variation found in 739 the wild, the variations are (still) manageable. Consequently, it is 740 suggested that "re:" and one of the variations of the tokens for forward 741 supported by the base subject extraction rules be adopted for Internet 742 mail messages, since doing so makes it a simple display time task to 743 localize the token language for the user. 745 8. IANA Considerations 747 [IMAP] capabilities are registered by publishing a standards track or 748 IESG approved experimental RFC. This document constitutes registration 749 of the SORT and THREAD capabilities in the [IMAP] capabilities registry. 751 This document creates a new [IMAP] threading algorithms registry, which 752 registers threading algorithms by publishing a standards track or IESG 753 approved experimental RFC. This document constitutes registration of 754 the ORDEREDSUBJECT and REFERENCES algorithms in that registry. 756 Appendices 758 A. Normative References 760 The following documents are normative to this document: 762 [ABNF] Crocker, D. and Overell, P. "Augmented BNF 763 for Syntax Specifications: ABNF", RFC 2234, 764 November 1997. 766 [CHARSET] Freed, N. and J. Postel, "IANA Character Set 767 Registration Procedures", RFC 2978, October 768 2000. 770 [COMPARATOR] Newman, C. "Internet Appplication Protocol 771 Collation Registry", Work in Progress. 773 [IMAP] Crispin, M. "Internet Message Access Protocol - 774 Version 4rev1", RFC 3501, March 2003. 776 [IMAP-I18N] Newman, C. "Internet Message Access Protocol 777 Internationalization", Work in Progress. 779 [KEYWORDS] Bradner, S. "Key words for use in RFCs to 780 Indicate Requirement Levels", RFC 2119, Harvard 781 University, March 1997. 783 [RFC-2822] Resnick, P. "Internet Message Format", RFC 2822, 784 April 2001. 786 B. Informative References 788 The following documents are informative to this document: 790 [IMAP-MODELS] Crispin, M. "Distributed Electronic Mail Models 791 in IMAP4", RFC 1733, December 1994. 793 [THREADING] Zawinski, J. "Message Threading", 794 http://www.jwz.org/doc/threading.html, 1997-2002. 796 Author's Address 798 Mark R. Crispin 799 Networks and Distributed Computing 800 University of Washington 801 4545 15th Avenue NE 802 Seattle, WA 98105-4527 804 Phone: (206) 543-5762 806 EMail: MRC@CAC.Washington.EDU 808 Kenneth Murchison 809 Oceana Matrix Ltd. 810 21 Princeton Place 811 Orchard Park, NY 14127 813 Phone: (716) 662-8973 x26 815 EMail: ken@oceana.com 817 IPR Disclosure Acknowledgement 819 By submitting this Internet-Draft, I certify that any applicable 820 patent or other IPR claims of which I am aware have been disclosed, 821 and any of which I become aware will be disclosed, in accordance with 822 RFC 3668. 824 Intellectual Property Statement 826 The IETF takes no position regarding the validity or scope of any 827 intellectual property or other rights that might be claimed to 828 pertain to the implementation or use of the technology described in 829 this document or the extent to which any license under such rights 830 might or might not be available; neither does it represent that it 831 has made any effort to identify any such rights. Information on the 832 IETF's procedures with respect to rights in standards-track and 833 standards-related documentation can be found in BCP-11. Copies of 834 claims of rights made available for publication and any assurances of 835 licenses to be made available, or the result of an attempt made to 836 obtain a general license or permission for the use of such 837 proprietary rights by implementors or users of this specification can 838 be obtained from the IETF Secretariat. 840 The IETF invites any interested party to bring to its attention any 841 copyrights, patents or patent applications, or other proprietary 842 rights which may cover technology that may be required to practice 843 this standard. Please address the information to the IETF Executive 844 Director. 846 Full Copyright Statement 848 Copyright (C) The Internet Society (2004). This document is subject 849 to the rights, licenses and restrictions contained in BCP 78 and 850 except as set forth therein, the authors retain all their rights. 852 This document and the information contained herein are provided on an 853 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE 854 REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE 855 INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR 856 IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 857 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 858 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 860 Intellectual Property 862 The IETF takes no position regarding the validity or scope of any 863 Intellectual Property Rights or other rights that might be claimed 864 to pertain to the implementation or use of the technology 865 described in this document or the extent to which any license 866 under such rights might or might not be available; nor does it 867 represent that it has made any independent effort to identify any 868 such rights. Information on the procedures with respect to 869 rights in RFC documents can be found in BCP 78 and BCP 79. 871 Copies of IPR disclosures made to the IETF Secretariat and any 872 assurances of licenses to be made available, or the result of an 873 attempt made to obtain a general license or permission for the use 874 of such proprietary rights by implementers or users of this 875 specification can be obtained from the IETF on-line IPR repository 876 at http://www.ietf.org/ipr. 878 The IETF invites any interested party to bring to its attention 879 any copyrights, patents or patent applications, or other 880 proprietary rights that may cover technology that may be required 881 to implement this standard. Please address the information to the 882 IETF at ietf-ipr@ietf.org. 884 Acknowledgement 886 Funding for the RFC Editor function is currently provided by the 887 Internet Society.