idnits 2.17.1 draft-ietf-imapext-sort-20.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 15. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 856. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 867. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 874. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 880. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document is more than 15 pages and seems to lack a Table of Contents. == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 886 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([IMAP]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 10, 2008) is 5884 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 3501 (ref. 'IMAP') (Obsoleted by RFC 9051) -- Possible downref: Non-RFC (?) normative reference: ref. 'IMAP-I18N' ** Obsolete normative reference: RFC 2822 (Obsoleted by RFC 5322) Summary: 5 errors (**), 0 flaws (~~), 2 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 IMAP Extensions Working Group M. Crispin 2 Internet-Draft K. Murchison 3 Intended status: Proposed Standard March 10, 2008 4 Expires: September 10, 2008 5 Document: internet-drafts/draft-ietf-imapext-sort-20.txt 7 INTERNET MESSAGE ACCESS PROTOCOL - SORT AND THREAD EXTENSIONS 9 Status of this Memo 11 By submitting this Internet-Draft, each author represents that 12 any applicable patent or other IPR claims of which he or she is 13 aware have been or will be disclosed, and any of which he or she 14 becomes aware will be disclosed, in accordance with Section 6 of 15 BCP 79. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as 20 Internet-Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 A revised version of this draft document will be submitted to the RFC 34 editor as a Proposed Standard for the Internet Community. Discussion 35 and suggestions for improvement are requested, and should be sent to 36 ietf-imapext@IMC.ORG. 38 Distribution of this memo is unlimited. 40 Abstract 42 This document describes the base-level server-based sorting and 43 threading extensions to the [IMAP] protocol. These extensions 44 provide substantial performance improvements for IMAP clients which 45 offer sorted and threaded views. 47 1. Introduction 49 The SORT and THREAD extensions to the [IMAP] protocol provide a means 50 of server-based sorting and threading of messages, without requiring 51 that the client download the necessary data to do so itself. This is 52 particularly useful for online clients as described in [IMAP-MODELS]. 54 A server which supports the base-level SORT extension indicates this 55 with a capability name which starts with "SORT". Future, 56 upwards-compatible extensions to the SORT extension will all start 57 with "SORT", indicating support for this base level. 59 A server which supports the THREAD extension indicates this with one 60 or more capability names consisting of "THREAD=" followed by a 61 supported threading algorithm name as described in this document. 62 This provides for future upwards-compatible extensions. 64 A server which implements the SORT and/or THREAD extensions MUST 65 collate strings in accordance with the requirements of I18NLEVEL=1, 66 as described in [IMAP-I18N], and SHOULD implement and advertise the 67 I18NLEVEL=1 extension. Alternatively, a server MAY implement 68 I18NLEVEL=2 (or higher) and comply with the rules of that level. 70 Discussion: the SORT and THREAD extensions predate [IMAP-I18N] by 71 several years. At the time of this writing, all known server 72 implementations of SORT and THREAD comply with the rules of 73 I18NLEVEL=1, but do not necessarily advertise it. As discussed 74 in [IMAP-I18N] section 4.5, all server implementations should 75 eventually be updated to comply with the I18NLEVEL=2 extension. 77 Historical note: the REFERENCES threading algorithm is based on the 78 [THREADING] algorithm written used in "Netscape Mail and News" 79 versions 2.0 through 3.0. 81 2. Terminology 83 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 84 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 85 document are to be interpreted as described in [KEYWORDS]. 87 The word "can" (not "may") is used to refer to a possible 88 circumstance or situation, as opposed to an optional facility of the 89 protocol. 91 "User" is used to refer to a human user, whereas "client" refers to 92 the software being run by the user. 94 In examples, "C:" and "S:" indicate lines sent by the client and 95 server respectively. 97 2.1 Base Subject 99 Subject sorting and threading use the "base subject," which has 100 specific subject artifacts removed. Due to the complexity of these 101 artifacts, the formal syntax for the subject extraction rules is 102 ambiguous. The following procedure is followed to determine the 103 "base subject", using the [ABNF] formal syntax rules described in 104 section 5: 106 (1) Convert any RFC 2047 encoded-words in the subject to 107 UTF-8 as described in "internationalization 108 considerations." Convert all tabs and continuations to 109 space. Convert all multiple spaces to a single space. 111 (2) Remove all trailing text of the subject that matches 112 the subj-trailer ABNF, repeat until no more matches are 113 possible. 115 (3) Remove all prefix text of the subject that matches the 116 subj-leader ABNF. 118 (4) If there is prefix text of the subject that matches the 119 subj-blob ABNF, and removing that prefix leaves a non-empty 120 subj-base, then remove the prefix text. 122 (5) Repeat (3) and (4) until no matches remain. 124 Note: it is possible to defer step (2) until step (6), but this 125 requires checking for subj-trailer in step (4). 127 (6) If the resulting text begins with the subj-fwd-hdr ABNF 128 and ends with the subj-fwd-trl ABNF, remove the 129 subj-fwd-hdr and subj-fwd-trl and repeat from step (2). 131 (7) The resulting text is the "base subject" used in the 132 SORT. 134 All servers and disconnected (as described in [IMAP-MODELS]) clients 135 MUST use exactly this algorithm to determine the "base subject". 136 Otherwise there is potential for a user to get inconsistent results 137 based on whether they are running in connected or disconnected mode. 139 2.2 Sent Date 141 As used in this document, the term "sent date" refers to the date and 142 time from the Date: header, adjusted by time zone to normalize to 143 UTC. For example, "31 Dec 2000 16:01:33 -0800" is equivalent to the 144 UTC date and time of "1 Jan 2001 00:01:33 +0000". 146 If the time zone is invalid, the date and time SHOULD be treated as 147 UTC. If the time is also invalid, the time SHOULD be treated as 148 00:00:00. If there is no valid date or time, the date and time 149 SHOULD be treated as 00:00:00 on the earliest possible date. 151 This differs from the date-related criteria in the SEARCH command 152 (described in [IMAP] section 6.4.4), which use just the date and not 153 the time, and are not adjusted by time zone. 155 If the sent date can not be determined (a Date: header is missing or 156 can not be parsed), the INTERNALDATE for that message is used as the 157 sent date. 159 When comparing two sent dates that match exactly, the order in which 160 the two messages appear in the mailbox (that is, by sequence number) 161 is used as a tie-breaker to determine the order. 163 3. Additional Commands 165 These commands are extension to the [IMAP] base protocol. 167 The section headings are intended to correspond with where they would 168 be located in the main document if they were part of the base 169 specification. 171 BASE.6.4.SORT. SORT Command 173 Arguments: sort program 174 charset specification 175 searching criteria (one or more) 177 Data: untagged responses: SORT 179 Result: OK - sort completed 180 NO - sort error: can't sort that charset or 181 criteria 182 BAD - command unknown or arguments invalid 184 The SORT command is a variant of SEARCH with sorting semantics for 185 the results. Sort has two arguments before the searching criteria 186 argument; a parenthesized list of sort criteria, and the searching 187 charset. 189 The charset argument is mandatory (unlike SEARCH) and indicates 190 the [CHARSET] of the strings that appear in the searching 191 criteria. The US-ASCII and UTF-8 charsets MUST be implemented. 192 All other charsets are optional. 194 There is also a UID SORT command which returns unique identifiers 195 instead of message sequence numbers. Note that there are separate 196 searching criteria for message sequence numbers and UIDs; thus the 197 arguments to UID SORT are interpreted the same as in SORT. This 198 is analogous to the behavior of UID SEARCH, as opposed to UID 199 COPY, UID FETCH, or UID STORE. 201 The SORT command first searches the mailbox for messages that 202 match the given searching criteria using the charset argument for 203 the interpretation of strings in the searching criteria. It then 204 returns the matching messages in an untagged SORT response, sorted 205 according to one or more sort criteria. 207 Sorting is in ascending order. Earlier dates sort before later 208 dates; smaller sizes sort before larger sizes; and strings are 209 sorted according to ascending values established by their 210 collation algorithm (see under "Internationalization 211 Considerations"). 213 If two or more messages exactly match according to the sorting 214 criteria, these messages are sorted according to the order in 215 which they appear in the mailbox. In other words, there is an 216 implicit sort criterion of "sequence number". 218 When multiple sort criteria are specified, the result is sorted in 219 the priority order that the criteria appear. For example, 220 (SUBJECT DATE) will sort messages in order by their base subject 221 text; and for messages with the same base subject text will sort 222 by their sent date. 224 Untagged EXPUNGE responses are not permitted while the server is 225 responding to a SORT command, but are permitted during a UID SORT 226 command. 228 The defined sort criteria are as follows. Refer to the Formal 229 Syntax section for the precise syntactic definitions of the 230 arguments. If the associated RFC-822 header for a particular 231 criterion is absent, it is treated as the empty string. The empty 232 string always collates before non-empty strings. 234 ARRIVAL 235 Internal date and time of the message. This differs from the 236 ON criteria in SEARCH, which uses just the internal date. 238 CC 239 [IMAP] addr-mailbox of the first "cc" address. 241 DATE 242 Sent date and time, as described in section 2.2. 244 FROM 245 [IMAP] addr-mailbox of the first "From" address. 247 REVERSE 248 Followed by another sort criterion, has the effect of that 249 criterion but in reverse (descending) order. 250 Note: REVERSE only reverses a single criterion, and does not 251 affect the implicit "sequence number" sort criterion if all 252 other criteria are identicial. Consequently, a sort of 253 REVERSE SUBJECT is not the same as a reverse ordering of a 254 SUBJECT sort. This can be avoided by use of additional 255 criteria, e.g. SUBJECT DATE vs. REVERSE SUBJECT REVERSE 256 DATE. In general, however, it's better (and faster, if the 257 client has a "reverse current ordering" command) to reverse 258 the results in the client instead of issuing a new SORT. 260 SIZE 261 Size of the message in octets. 263 SUBJECT 264 Base subject text. 266 TO 267 [IMAP] addr-mailbox of the first "To" address. 269 Example: C: A282 SORT (SUBJECT) UTF-8 SINCE 1-Feb-1994 270 S: * SORT 2 84 882 271 S: A282 OK SORT completed 272 C: A283 SORT (SUBJECT REVERSE DATE) UTF-8 ALL 273 S: * SORT 5 3 4 1 2 274 S: A283 OK SORT completed 275 C: A284 SORT (SUBJECT) US-ASCII TEXT "not in mailbox" 276 S: * SORT 277 S: A284 OK SORT completed 279 BASE.6.4.THREAD. THREAD Command 281 Arguments: threading algorithm 282 charset specification 283 searching criteria (one or more) 285 Data: untagged responses: THREAD 287 Result: OK - thread completed 288 NO - thread error: can't thread that charset or 289 criteria 290 BAD - command unknown or arguments invalid 292 The THREAD command is a variant of SEARCH with threading semantics 293 for the results. Thread has two arguments before the searching 294 criteria argument; a threading algorithm, and the searching 295 charset. 297 The charset argument is mandatory (unlike SEARCH) and indicates 298 the [CHARSET] of the strings that appear in the searching 299 criteria. The US-ASCII and UTF-8 charsets MUST be implemented. 300 All other charsets are optional. 302 There is also a UID THREAD command which returns unique 303 identifiers instead of message sequence numbers. Note that there 304 are separate searching criteria for message sequence numbers and 305 UIDs; thus the arguments to UID THREAD are interpreted the same as 306 in THREAD. This is analogous to the behavior of UID SEARCH, as 307 opposed to UID COPY, UID FETCH, or UID STORE. 309 The THREAD command first searches the mailbox for messages that 310 match the given searching criteria using the charset argument for 311 the interpretation of strings in the searching criteria. It then 312 returns the matching messages in an untagged THREAD response, 313 threaded according to the specified threading algorithm. 315 All collation is in ascending order. Earlier dates collate before 316 later dates and strings are collated according to ascending values 317 established by their collation algorithm (see under 318 "Internationalization Considerations"). 320 Untagged EXPUNGE responses are not permitted while the server is 321 responding to a THREAD command, but are permitted during a UID 322 THREAD command. 324 The defined threading algorithms are as follows: 326 ORDEREDSUBJECT 328 The ORDEREDSUBJECT threading algorithm is also referred to as 329 "poor man's threading." The searched messages are sorted by 330 base subject and then by the sent date. The messages are then 331 split into separate threads, with each thread containing 332 messages with the same base subject text. Finally, the threads 333 are sorted by the sent date of the first message in the thread. 335 The first message of each thread are siblings of each other 336 (the "root"). The second message of a thread is the child of 337 the first message, and subsequent messages of the thread are 338 siblings of the second message and hence children of the 339 message at the root. Hence, there are no grandchildren in 340 ORDEREDSUBJECT threading. 342 Children in ORDEREDSUBJECT threading do not have descendents. 343 Client implementations SHOULD treat descendents of a child in 344 a server response as being siblings of that child. 346 REFERENCES 348 The REFERENCES threading algorithm threads the searched 349 messages by grouping them together in parent/child 350 relationships based on which messages are replies to others. 351 The parent/child relationships are built using two methods: 352 reconstructing a message's ancestry using the references 353 contained within it; and checking the original (not base) 354 subject of a message to see if it is a reply to (or forward of) 355 another message. 357 Note: "Message ID" in the following description refers to a 358 normalized form of the msg-id in [RFC-2822]. The actual 359 text in an RFC 2822 may use quoting, resulting in multiple 360 ways of expressing the same Message ID. Implementations of 361 the REFERENCES threading algorithm MUST normalize any msg-id 362 in order to avoid false non-matches due to differences in 363 quoting. 365 For example, the msg-id 366 <"01KF8JCEOCBS0045PS"@xxx.yyy.com> 367 and the msg-id 368 <01KF8JCEOCBS0045PS@xxx.yyy.com> 369 MUST be interpreted as being the same Message ID. 371 The references used for reconstructing a message's ancestry are 372 found using the following rules: 374 If a message contains a References header line, then use the 375 Message IDs in the References header line as the references. 377 If a message does not contain a References header line, or 378 the References header line does not contain any valid 379 Message IDs, then use the first (if any) valid Message ID 380 found in the In-Reply-To header line as the only reference 381 (parent) for this message. 383 Note: Although [RFC-2822] permits multiple Message IDs in 384 the In-Reply-To header, in actual practice this 385 discipline has not been followed. For example, 386 In-Reply-To headers have been observed with message 387 addresses after the Message ID, and there are no good 388 heuristics for software to determine the difference. 389 This is not a problem with the References header however. 391 If a message does not contain an In-Reply-To header line, or 392 the In-Reply-To header line does not contain a valid Message 393 ID, then the message does not have any references (NIL). 395 A message is considered to be a reply or forward if the base 396 subject extraction rules, applied to the original subject, 397 remove any of the following: a subj-refwd, a "(fwd)" 398 subj-trailer, or a subj-fwd-hdr and subj-fwd-trl. 400 The REFERENCES algorithm is significantly more complex than 401 ORDEREDSUBJECT and consists of six main steps. These steps are 402 outlined in detail below. 404 (1) For each searched message: 406 (A) Using the Message IDs in the message's references, link 407 the corresponding messages (those whose Message-ID header 408 line contains the given reference Message ID) together as 409 parent/child. Make the first reference the parent of the 410 second (and the second a child of the first), the second the 411 parent of the third (and the third a child of the second), 412 etc. The following rules govern the creation of these 413 links: 415 If a message does not contain a Message-ID header line, 416 or the Message-ID header line does not contain a valid 417 Message ID, then assign a unique Message ID to this 418 message. 420 If two or more messages have the same Message ID, then 421 only use that Message ID in the first (lowest sequence 422 number) message, and assign a unique Message ID to each 423 of the subsequent messages with a duplicate of that 424 Message ID. 426 If no message can be found with a given Message ID, 427 create a dummy message with this ID. Use this dummy 428 message for all subsequent references to this ID. 430 If a message already has a parent, don't change the 431 existing link. This is done because the References 432 header line may have been truncated by a MUA. As a 433 result, there is no guarantee that the messages 434 corresponding to adjacent Message IDs in the References 435 header line are parent and child. 437 Do not create a parent/child link if creating that link 438 would introduce a loop. For example, before making 439 message A the parent of B, make sure that A is not a 440 descendent of B. 442 Note: Message ID comparisons are case-sensitive. 444 (B) Create a parent/child link between the last reference 445 (or NIL if there are no references) and the current message. 446 If the current message already has a parent, it is probably 447 the result of a truncated References header line, so break 448 the current parent/child link before creating the new 449 correct one. As in step 1.A, do not create the parent/child 450 link if creating that link would introduce a loop. Note 451 that if this message has no references, that it will now 452 have no parent. 454 Note: The parent/child links created in steps 1.A and 1.B 455 MUST be kept consistent with one another at ALL times. 457 (2) Gather together all of the messages that have no parents 458 and make them all children (siblings of one another) of a dummy 459 parent (the "root"). These messages constitute the first 460 (head) message of the threads created thus far. 462 (3) Prune dummy messages from the thread tree. Traverse each 463 thread under the root, and for each message: 465 If it is a dummy message with NO children, delete it. 467 If it is a dummy message with children, delete it, but 468 promote its children to the current level. In other words, 469 splice them in with the dummy's siblings. 471 Do not promote the children if doing so would make them 472 children of the root, unless there is only one child. 474 (4) Sort the messages under the root (top-level siblings only) 475 by sent date as described in section 2.2. In the case of a 476 dummy message, sort its children by sent date and then use the 477 first child for the top-level sort. 479 (5) Gather together messages under the root that have the same 480 base subject text. 482 (A) Create a table for associating base subjects with 483 messages, called the subject table. 485 (B) Populate the subject table with one message per each 486 base subject. For each child of the root: 488 (i) Find the subject of this thread, by using the base 489 subject from either the current message or its first 490 child if the current message is a dummy. This is the 491 thread subject. 493 (ii) If the thread subject is empty, skip this message. 495 (iii) Look up the message associated with the thread 496 subject in the subject table. 498 (iv) If there is no message in the subject table with the 499 thread subject, add the current message and the thread 500 subject to the subject table. 502 Otherwise, if the message in the subject table is not a 503 dummy, AND either of the following criteria are true: 505 The current message is a dummy, OR 507 The message in the subject table is a reply or forward 508 and the current message is not. 510 then replace the message in the subject table with the 511 current message. 513 (C) Merge threads with the same thread subject. For each 514 child of the root: 516 (i) Find the message's thread subject as in step 5.B.i 517 above. 519 (ii) If the thread subject is empty, skip this message. 521 (iii) Lookup the message associated with this thread 522 subject in the subject table. 524 (iv) If the message in the subject table is the current 525 message, skip this message. 527 Otherwise, merge the current message with the one in the 528 subject table using the following rules: 530 If both messages are dummies, append the current 531 message's children to the children of the message in 532 the subject table (the children of both messages 533 become siblings), and then delete the current message. 535 If the message in the subject table is a dummy and the 536 current message is not, make the current message a 537 child of the message in the subject table (a sibling 538 of its children). 540 If the current message is a reply or forward and the 541 message in the subject table is not, make the current 542 message a child of the message in the subject table (a 543 sibling of its children). 545 Otherwise, create a new dummy message and make both 546 the current message and the message in the subject 547 table children of the dummy. Then replace the message 548 in the subject table with the dummy message. 550 Note: Subject comparisons are case-insensitive, as 551 described under "Internationalization 552 Considerations." 554 (6) Traverse the messages under the root and sort each set of 555 siblings by sent date as described in section 2.2. Traverse 556 the messages in such a way that the "youngest" set of siblings 557 are sorted first, and the "oldest" set of siblings are sorted 558 last (grandchildren are sorted before children, etc). In the 559 case of a dummy message (which can only occur with top-level 560 siblings), use its first child for sorting. 562 Example: C: A283 THREAD ORDEREDSUBJECT UTF-8 SINCE 5-MAR-2000 563 S: * THREAD (166)(167)(168)(169)(172)(170)(171) 564 (173)(174 (175)(176)(178)(181)(180))(179)(177 565 (183)(182)(188)(184)(185)(186)(187)(189))(190) 566 (191)(192)(193)(194 195)(196 (197)(198))(199) 567 (200 202)(201)(203)(204)(205)(206 207)(208) 568 S: A283 OK THREAD completed 569 C: A284 THREAD ORDEREDSUBJECT US-ASCII TEXT "gewp" 570 S: * THREAD 571 S: A284 OK THREAD completed 572 C: A285 THREAD REFERENCES UTF-8 SINCE 5-MAR-2000 573 S: * THREAD (166)(167)(168)(169)(172)((170)(179)) 574 (171)(173)((174)(175)(176)(178)(181)(180)) 575 ((177)(183)(182)(188 (184)(189))(185 186)(187)) 576 (190)(191)(192)(193)((194)(195 196))(197 198) 577 (199)(200 202)(201)(203)(204)(205 206 207)(208) 578 S: A285 OK THREAD completed 580 Note: The line breaks in the first and third server 581 responses are for editorial clarity and do not appear in 582 real THREAD responses. 584 4. Additional Responses 586 These responses are extensions to the [IMAP] base protocol. 588 The section headings of these responses are intended to correspond 589 with where they would be located in the main document. 591 BASE.7.2.SORT. SORT Response 593 Data: zero or more numbers 595 The SORT response occurs as a result of a SORT or UID SORT 596 command. The number(s) refer to those messages that match the 597 search criteria. For SORT, these are message sequence numbers; 598 for UID SORT, these are unique identifiers. Each number is 599 delimited by a space. 601 Example: S: * SORT 2 3 6 603 BASE.7.2.THREAD. THREAD Response 605 Data: zero or more threads 607 The THREAD response occurs as a result of a THREAD or UID THREAD 608 command. It contains zero or more threads. A thread consists of 609 a parenthesized list of thread members. 611 Thread members consist of zero or more message numbers, delimited 612 by spaces, indicating successive parent and child. This continues 613 until the thread splits into multiple sub-threads, at which point 614 the thread nests into multiple sub-threads with the first member 615 of each subthread being siblings at this level. There is no limit 616 to the nesting of threads. 618 The messages numbers refer to those messages that match the search 619 criteria. For THREAD, these are message sequence numbers; for UID 620 THREAD, these are unique identifiers. 622 Example: S: * THREAD (2)(3 6 (4 23)(44 7 96)) 624 The first thread consists only of message 2. The second thread 625 consists of the messages 3 (parent) and 6 (child), after which it 626 splits into two subthreads; the first of which contains messages 4 627 (child of 6, sibling of 44) and 23 (child of 4), and the second of 628 which contains messages 44 (child of 6, sibling of 4), 7 (child of 629 44), and 96 (child of 7). Since some later messages are parents 630 of earlier messages, the messages were probably moved from some 631 other mailbox at different times. 633 -- 2 635 -- 3 636 \-- 6 637 |-- 4 638 | \-- 23 639 | 640 \-- 44 641 \-- 7 642 \-- 96 644 Example: S: * THREAD ((3)(5)) 646 In this example, 3 and 5 are siblings of a parent which does not 647 match the search criteria (and/or does not exist in the mailbox); 648 however they are members of the same thread. 650 5. Formal Syntax of SORT and THREAD Commands and Responses 652 The following syntax specification uses the Augmented Backus-Naur 653 Form (ABNF) notation as specified in [ABNF]. It also uses [ABNF] 654 rules defined in [IMAP]. 656 sort = ["UID" SP] "SORT" SP sort-criteria SP search-criteria 658 sort-criteria = "(" sort-criterion *(SP sort-criterion) ")" 660 sort-criterion = ["REVERSE" SP] sort-key 662 sort-key = "ARRIVAL" / "CC" / "DATE" / "FROM" / "SIZE" / 663 "SUBJECT" / "TO" 665 thread = ["UID" SP] "THREAD" SP thread-alg SP search-criteria 667 thread-alg = "ORDEREDSUBJECT" / "REFERENCES" / thread-alg-ext 669 thread-alg-ext = atom 670 ; New algorithms MUST be registered with IANA 672 search-criteria = charset 1*(SP search-key) 674 charset = atom / quoted 675 ; CHARSET values MUST be registered with IANA 677 sort-data = "SORT" *(SP nz-number) 679 thread-data = "THREAD" [SP 1*thread-list] 681 thread-list = "(" (thread-members / thread-nested) ")" 683 thread-members = nz-number *(SP nz-number) [SP thread-nested] 685 thread-nested = 2*thread-list 687 The following syntax describes base subject extraction rules (2)-(6): 689 subject = *subj-leader [subj-middle] *subj-trailer 691 subj-refwd = ("re" / ("fw" ["d"])) *WSP [subj-blob] ":" 693 subj-blob = "[" *BLOBCHAR "]" *WSP 695 subj-fwd = subj-fwd-hdr subject subj-fwd-trl 697 subj-fwd-hdr = "[fwd:" 699 subj-fwd-trl = "]" 701 subj-leader = (*subj-blob subj-refwd) / WSP 703 subj-middle = *subj-blob (subj-base / subj-fwd) 704 ; last subj-blob is subj-base if subj-base would 705 ; otherwise be empty 707 subj-trailer = "(fwd)" / WSP 709 subj-base = NONWSP *(*WSP NONWSP) 710 ; can be a subj-blob 712 BLOBCHAR = %x01-5a / %x5c / %x5e-ff 713 ; any CHAR8 except '[' and ']' 715 NONWSP = %x01-08 / %x0a-1f / %x21-ff 716 ; any CHAR8 other than WSP 718 6. Security Considerations 720 The SORT and THREAD extensions do not raise any security 721 considerations that are not present in the base [IMAP] protocol, and 722 these issues are discussed in [IMAP]. Nevertheless, it is important 723 to remember that [IMAP] protocol transactions, including message 724 data, are sent in the clear over the network unless protection from 725 snooping is negotiated, either by the use of STARTTLS, privacy 726 protection is negotiated in the AUTHENTICATE command, or some other 727 protection mechanism. 729 Although not a security consideration, it is important to recognize 730 that sorting by REFERENCES can lead to misleading threading trees. 731 For example, a message with false References: header data will cause 732 a thread to be incorporated into another thread. 734 The process of extracting the base subject may lead to incorrect 735 collation if the extracted data was significant text as opposed to 736 a subject artifact. 738 7. Internationalization Considerations 740 As stated in the introduction, the rules of I18NLEVEL=1 as described 741 in [IMAP-I18N] MUST be followed; that is, the SORT and THREAD 742 extensions MUST collate strings according to the i;unicode-casemap 743 collation described in [UNICASEMAP]. Servers SHOULD also advertise 744 the I18NLEVEL=1 extension. Alternatively, a server MAY implement 745 I18NLEVEL=2 (or higher) and comply with the rules of that level. 747 As discussed in [IMAP-I18N] section 4.5, all server implementations 748 should eventually be updated to support the [IMAP-I18N] I18NLEVEL=2 749 extension. 751 Translations of the "re" or "fw"/"fwd" tokens are not specified for 752 removal in the base subject extraction process. An attempt to add 753 such translated tokens would result in a geometrically complex, and 754 ultimately unimplementable, task. 756 Instead, note that [RFC-2822] section 3.6.5 recommends that "re:" 757 (from the Latin "res", in the matter of) be used to identify a reply. 758 Although it is evident that, from the multiple forms of token to 759 identify a forwarded message, there is considerable variation found 760 in the wild, the variations are (still) manageable. Consequently, it 761 is suggested that "re:" and one of the variations of the tokens for 762 forward supported by the base subject extraction rules be adopted for 763 Internet mail messages, since doing so makes it a simple display time 764 task to localize the token language for the user. 766 8. IANA Considerations 768 [IMAP] capabilities are registered by publishing a standards track or 769 IESG approved experimental RFC. This document constitutes 770 registration of the SORT and THREAD capabilities in the [IMAP] 771 capabilities registry. 773 This document creates a new [IMAP] threading algorithms registry, 774 which registers threading algorithms by publishing a standards track 775 or IESG approved experimental RFC. This document constitutes 776 registration of the ORDEREDSUBJECT and REFERENCES algorithms in that 777 registry. 779 9. Normative References 781 The following documents are normative to this document: 783 [ABNF] Crocker, D. and Overell, P. "Augmented BNF 784 for Syntax Specifications: ABNF", RFC 5234 785 January 2008 787 [CHARSET] Freed, N. and Postel, J. "IANA Character Set 788 Registration Procedures", RFC 2978, October 789 2000. 791 [IMAP] Crispin, M. "Internet Message Access Protocol - 792 Version 4rev1", RFC 3501, March 2003. 794 [IMAP-I18N] Newman, C. and Gulbrandsen, A. "Internet 795 Message Access Protocol Internationalization", 796 Work in Progress. 798 [KEYWORDS] Bradner, S. "Key words for use in RFCs to 799 Indicate Requirement Levels", BCP 14, RFC 2119, 800 March 1997. 802 [RFC-2822] Resnick, P. "Internet Message Format", RFC 803 2822, April 2001. 805 [UNICASEMAP] Crispin, M. "i;unicode-casemap - Simple Unicode 806 Collation Algorithm", RFC 5051. 808 10. Informative References 810 The following documents are informative to this document: 812 [IMAP-MODELS] Crispin, M. "Distributed Electronic Mail Models 813 in IMAP4", RFC 1733, December 1994. 815 [THREADING] Zawinski, J. "Message Threading", 816 http://www.jwz.org/doc/threading.html, 817 1997-2002. 819 Appendices 821 Author's Address 823 Mark R. Crispin 824 Networks and Distributed Computing 825 University of Washington 826 4545 15th Avenue NE 827 Seattle, WA 98105-4527 829 Phone: +1 (206) 543-5762 831 EMail: MRC@CAC.Washington.EDU 833 Kenneth Murchison 834 Carnegie Mellon University 835 5000 Forbes Avenue 836 Cyert Hall 285 837 Pittsburgh, PA 15213 839 Phone: +1 (412) 268-2638 840 Email: murch@andrew.cmu.edu 842 Full Copyright Statement 844 Copyright (C) The IETF Trust (2008). 846 This document is subject to the rights, licenses and restrictions 847 contained in BCP 78, and except as set forth therein, the authors 848 retain all their rights. 850 This document and the information contained herein are provided on an 851 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 852 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 853 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 854 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 855 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 856 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 858 Intellectual Property 860 The IETF takes no position regarding the validity or scope of any 861 Intellectual Property Rights or other rights that might be claimed to 862 pertain to the implementation or use of the technology described in 863 this document or the extent to which any license under such rights 864 might or might not be available; nor does it represent that it has 865 made any independent effort to identify any such rights. Information 866 on the procedures with respect to rights in RFC documents can be 867 found in BCP 78 and BCP 79. 869 Copies of IPR disclosures made to the IETF Secretariat and any 870 assurances of licenses to be made available, or the result of an 871 attempt made to obtain a general license or permission for the use of 872 such proprietary rights by implementers or users of this 873 specification can be obtained from the IETF on-line IPR repository at 874 http://www.ietf.org/ipr. 876 The IETF invites any interested party to bring to its attention any 877 copyrights, patents or patent applications, or other proprietary 878 rights that may cover technology that may be required to implement 879 this standard. Please address the information to the IETF at ietf- 880 ipr@ietf.org. 882 Acknowledgement 884 Funding for the RFC Editor function is currently provided by the 885 Internet Society.