idnits 2.17.1 draft-ietf-imapext-sort-18.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 14. -- Found old boilerplate from RFC 3978, Section 5.5 on line 844. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 855. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 862. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 868. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 901 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 71 instances of too long lines in the document, the longest one being 1 character in excess of 72. ** The abstract seems to contain references ([IMAP]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 2006) is 6372 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 4234 (ref. 'ABNF') (Obsoleted by RFC 5234) -- Possible downref: Non-RFC (?) normative reference: ref. 'COMPARATOR' ** Obsolete normative reference: RFC 3501 (ref. 'IMAP') (Obsoleted by RFC 9051) -- Possible downref: Non-RFC (?) normative reference: ref. 'IMAP-I18N' ** Obsolete normative reference: RFC 2822 (Obsoleted by RFC 5322) Summary: 10 errors (**), 0 flaws (~~), 3 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IMAP Extensions Working Group M. Crispin 3 INTERNET-DRAFT: IMAP SORT K. Murchison 4 Document: internet-drafts/draft-ietf-imapext-sort-18.txt November 2006 6 INTERNET MESSAGE ACCESS PROTOCOL - SORT AND THREAD EXTENSIONS 8 Status of this Memo 10 By submitting this Internet-Draft, each author represents that 11 any applicable patent or other IPR claims of which he or she is 12 aware have been or will be disclosed, and any of which he or she 13 becomes aware will be disclosed, in accordance with Section 6 of 14 BCP 79. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as 19 Internet-Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six months 22 and may be updated, replaced, or obsoleted by other documents at any 23 time. It is inappropriate to use Internet-Drafts as reference 24 material or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 A revised version of this document will be submitted to the RFC 33 editor as an Informational Document for the Internet Community. 35 A revised version of this draft document will be submitted to the RFC 36 editor as a Proposed Standard for the Internet Community. Discussion 37 and suggestions for improvement are requested, and should be sent to 38 ietf-imapext@IMC.ORG. This document will expire before 16 May 2007. 39 Distribution of this memo is unlimited. 41 Abstract 43 This document describes the base-level server-based sorting and 44 threading extensions to the [IMAP] protocol. These extensions 45 provide substantial performance improvements for IMAP clients which 46 offer sorted and threaded views. 48 1. Introduction 50 The SORT and THREAD extensions to the [IMAP] protocol provide a means 51 of server-based sorting and threading of messages, without requiring 52 that the client download the necessary data to do so itself. This is 53 particularly useful for online clients as described in [IMAP-MODELS]. 55 A server which supports the base-level SORT extension indicates this 56 with a capability name which starts with "SORT". Future, 57 upwards-compatible extensions to the SORT extension will all start 58 with "SORT", indicating support for this base level. 60 A server which supports the THREAD extension indicates this with one 61 or more capability names consisting of "THREAD=" followed by a 62 supported threading algorithm name as described in this document. 63 This provides for future upwards-compatible extensions. 65 A server which implements the SORT and/or THREAD extensions SHOULD 66 also implement the COMPARATOR extension as described in [IMAP-I18N]. 68 2. Terminology 70 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 71 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 72 document are to be interpreted as described in [KEYWORDS]. 74 The word "can" (not "may") is used to refer to a possible 75 circumstance or situation, as opposed to an optional facility of the 76 protocol. 78 "User" is used to refer to a human user, whereas "client" refers to 79 the software being run by the user. 81 In examples, "C:" and "S:" indicate lines sent by the client and 82 server respectively. 84 2.1 Base Subject 86 Subject sorting and threading use the "base subject," which has 87 specific subject artifacts removed. Due to the complexity of these 88 artifacts, the formal syntax for the subject extraction rules is 89 ambiguous. The following procedure is followed to determine the 90 "base subject", using the [ABNF] formal syntax rules described in 91 section 5: 93 (1) Convert any RFC 2047 encoded-words in the subject to 94 UTF-8 as described in "internationalization 95 considerations." Convert all tabs and continuations to 96 space. Convert all multiple spaces to a single space. 98 (2) Remove all trailing text of the subject that matches 99 the subj-trailer ABNF, repeat until no more matches are 100 possible. 102 (3) Remove all prefix text of the subject that matches the 103 subj-leader ABNF. 105 (4) If there is prefix text of the subject that matches the 106 subj-blob ABNF, and removing that prefix leaves a non-empty 107 subj-base, then remove the prefix text. 109 (5) Repeat (3) and (4) until no matches remain. 111 Note: it is possible to defer step (2) until step (6), but this 112 requires checking for subj-trailer in step (4). 114 (6) If the resulting text begins with the subj-fwd-hdr ABNF 115 and ends with the subj-fwd-trl ABNF, remove the 116 subj-fwd-hdr and subj-fwd-trl and repeat from step (2). 118 (7) The resulting text is the "base subject" used in the 119 SORT. 121 All servers and disconnected (as described in [IMAP-MODELS]) clients 122 MUST use exactly this algorithm to determine the "base subject". 123 Otherwise there is potential for a user to get inconsistent results 124 based on whether they are running in connected or disconnected mode. 126 2.2 Sent Date 128 As used in this document, the term "sent date" refers to the date and 129 time from the Date: header, adjusted by time zone to normalize to 130 UTC. For example, "31 Dec 2000 16:01:33 -0800" is equivalent to the 131 UTC date and time of "1 Jan 2001 00:01:33 +0000". 133 If the time zone is invalid, the date and time SHOULD be treated as 134 UTC. If the time is also invalid, the time SHOULD be treated as 135 00:00:00. If there is no valid date or time, the date and time 136 SHOULD be treated as 00:00:00 on the earliest possible date. 138 This differs from the date-related criteria in the SEARCH command 139 (described in [IMAP] section 6.4.4), which use just the date and not 140 the time, and are not adjusted by time zone. 142 3. Additional Commands 144 These commands are extension to the [IMAP] base protocol. 146 The section headings are intended to correspond with where they would 147 be located in the main document if they were part of the base 148 specification. 150 BASE.6.4.SORT. SORT Command 152 Arguments: sort program 153 charset specification 154 searching criteria (one or more) 156 Data: untagged responses: SORT 158 Result: OK - sort completed 159 NO - sort error: can't sort that charset or 160 criteria 161 BAD - command unknown or arguments invalid 163 The SORT command is a variant of SEARCH with sorting semantics for 164 the results. Sort has two arguments before the searching criteria 165 argument; a parenthesized list of sort criteria, and the searching 166 charset. 168 The charset argument is mandatory (unlike SEARCH) and indicates 169 the [CHARSET] of the strings that appear in the searching 170 criteria. The US-ASCII and UTF-8 charsets MUST be implemented. 171 All other charsets are optional. 173 There is also a UID SORT command which returns unique identifiers 174 instead of message sequence numbers. Note that there are separate 175 searching criteria for message sequence numbers and UIDs; thus the 176 arguments to UID SORT are interpreted the same as in SORT. This 177 is analogous to the behavior of UID SEARCH, as opposed to UID 178 COPY, UID FETCH, or UID STORE. 180 The SORT command first searches the mailbox for messages that 181 match the given searching criteria using the charset argument for 182 the interpretation of strings in the searching criteria. It then 183 returns the matching messages in an untagged SORT response, sorted 184 according to one or more sort criteria. 186 Sorting is in ascending order. Earlier dates sort before later 187 dates; smaller sizes sort before larger sizes; and strings are 188 sorted according to ascending values established by their 189 collation algorithm (see under "Internationalization 190 Considerations"). 192 If two or more messages exactly match according to the sorting 193 criteria, these messages are sorted according to the order in 194 which they appear in the mailbox. In other words, there is an 195 implicit sort criterion of "sequence number". 197 When multiple sort criteria are specified, the result is sorted in 198 the priority order that the criteria appear. For example, 199 (SUBJECT DATE) will sort messages in order by their base subject 200 text; and for messages with the same base subject text will sort 201 by their sent date. 203 Untagged EXPUNGE responses are not permitted while the server is 204 responding to a SORT command, but are permitted during a UID SORT 205 command. 207 The defined sort criteria are as follows. Refer to the Formal 208 Syntax section for the precise syntactic definitions of the 209 arguments. If the associated RFC-822 header for a particular 210 criterion is absent, it is treated as the empty string. The empty 211 string always collates before non-empty strings. 213 ARRIVAL 214 Internal date and time of the message. This differs from the 215 ON criteria in SEARCH, which uses just the internal date. 217 CC 218 [IMAP] addr-mailbox of the first "cc" address. 220 DATE 221 Sent date and time from the Date: header, adjusted by time 222 zone. This differs from the SENTON criteria in SEARCH, which 223 uses just the date and not the time, nor adjusts by time zone. 225 FROM 226 [IMAP] addr-mailbox of the first "From" address. 228 REVERSE 229 Followed by another sort criterion, has the effect of that 230 criterion but in reverse (descending) order. 231 Note: REVERSE only reverses a single criterion, and does not 232 affect the implicit "sequence number" sort criterion if all 233 other criteria are identicial. Consequently, a sort of 234 REVERSE SUBJECT is not the same as a reverse ordering of a 235 SUBJECT sort. This can be avoided by use of additional 236 criteria, e.g. SUBJECT DATE vs. REVERSE SUBJECT REVERSE 237 DATE. In general, however, it's better (and faster, if the 238 client has a "reverse current ordering" command) to reverse 239 the results in the client instead of issuing a new SORT. 241 SIZE 242 Size of the message in octets. 244 SUBJECT 245 Base subject text. 247 TO 248 [IMAP] addr-mailbox of the first "To" address. 250 Example: C: A282 SORT (SUBJECT) UTF-8 SINCE 1-Feb-1994 251 S: * SORT 2 84 882 252 S: A282 OK SORT completed 253 C: A283 SORT (SUBJECT REVERSE DATE) UTF-8 ALL 254 S: * SORT 5 3 4 1 2 255 S: A283 OK SORT completed 256 C: A284 SORT (SUBJECT) US-ASCII TEXT "not in mailbox" 257 S: * SORT 258 S: A284 OK SORT completed 260 BASE.6.4.THREAD. THREAD Command 262 Arguments: threading algorithm 263 charset specification 264 searching criteria (one or more) 266 Data: untagged responses: THREAD 268 Result: OK - thread completed 269 NO - thread error: can't thread that charset or 270 criteria 271 BAD - command unknown or arguments invalid 273 The THREAD command is a variant of SEARCH with threading semantics 274 for the results. Thread has two arguments before the searching 275 criteria argument; a threading algorithm, and the searching 276 charset. 278 The charset argument is mandatory (unlike SEARCH) and indicates 279 the [CHARSET] of the strings that appear in the searching 280 criteria. The US-ASCII and UTF-8 charsets MUST be implemented. 281 All other charsets are optional. 283 There is also a UID THREAD command which returns unique 284 identifiers instead of message sequence numbers. Note that there 285 are separate searching criteria for message sequence numbers and 286 UIDs; thus the arguments to UID THREAD are interpreted the same as 287 in THREAD. This is analogous to the behavior of UID SEARCH, as 288 opposed to UID COPY, UID FETCH, or UID STORE. 290 The THREAD command first searches the mailbox for messages that 291 match the given searching criteria using the charset argument for 292 the interpretation of strings in the searching criteria. It then 293 returns the matching messages in an untagged THREAD response, 294 threaded according to the specified threading algorithm. 296 All collation is in ascending order. Earlier dates collate before 297 later dates and strings are collated according to ascending values 298 established by their collation algorithm (see under 299 "Internationalization Considerations"). 301 Untagged EXPUNGE responses are not permitted while the server is 302 responding to a THREAD command, but are permitted during a UID 303 THREAD command. 305 The defined threading algorithms are as follows: 307 ORDEREDSUBJECT 309 The ORDEREDSUBJECT threading algorithm is also referred to as 310 "poor man's threading." The searched messages are sorted by 311 base subject and then by the sent date. The messages are then 312 split into separate threads, with each thread containing 313 messages with the same base subject text. Finally, the threads 314 are sorted by the sent date of the first message in the thread. 316 The first message of each thread are siblings of each other 317 (the "root"). The second message of a thread is the child of 318 the first message, and subsequent messages of the thread are 319 siblings of the second message and hence children of the 320 message at the root. Hence, there are no grandchildren in 321 ORDEREDSUBJECT threading. 323 Note: early drafts of this specification specified 324 that each message in an ORDEREDSUBJECT thread is a child 325 (as opposed to a sibling) of the previous message. This 326 is now deprecated. For compatibility with servers which 327 may still use the old definition, client implementations 328 SHOULD treat descendents of a child as being siblings of 329 that child. 331 This is because the old definition mistakenly indicated 332 that there was a parent/child relationship between 333 successive messages in a thread; when in fact there was 334 only a chronological relationship. In clients which 335 indicate parent/child relationships in a thread tree, 336 this would indicate levels of descent which did not 337 exist. 339 REFERENCES 341 The REFERENCES threading algorithm is based on the [THREADING] 342 algorithm written used in "Netscape Mail and News" versions 2.0 343 through 3.0. This algorithm threads the searched messages by 344 grouping them together in parent/child relationships based on 345 which messages are replies to others. The parent/child 346 relationships are built using two methods: reconstructing a 347 message's ancestry using the references contained within it; 348 and checking the original (not base) subject of a message to 349 see if it is a reply to (or forward of) another message. 351 Note: "Message ID" in the following description refers to a 352 normalized form of the msg-id in [RFC-2822]. The actual 353 text in an RFC 2822 may use quoting, resulting in multiple 354 ways of expressing the same Message ID. Implementations of 355 the REFERENCES threading algorithm MUST normalize any msg-id 356 in order to avoid false non-matches due to differences in 357 quoting. 359 For example, the msg-id 360 <"01KF8JCEOCBS0045PS"@xxx.yyy.com> 361 and the msg-id 362 <01KF8JCEOCBS0045PS@xxx.yyy.com> 363 MUST be interpreted as being the same Message ID. 365 The references used for reconstructing a message's ancestry are 366 found using the following rules: 368 If a message contains a References header line, then use the 369 Message IDs in the References header line as the references. 371 If a message does not contain a References header line, or 372 the References header line does not contain any valid 373 Message IDs, then use the first (if any) valid Message ID 374 found in the In-Reply-To header line as the only reference 375 (parent) for this message. 377 Note: Although [RFC-2822] permits multiple Message IDs in 378 the In-Reply-To header, in actual practice this 379 discipline has not been followed. For example, 380 In-Reply-To headers have been observed with message 381 addresses after the Message ID, and there are no good 382 heuristics for software to determine the difference. 383 This is not a problem with the References header however. 385 If a message does not contain an In-Reply-To header line, or 386 the In-Reply-To header line does not contain a valid Message 387 ID, then the message does not have any references (NIL). 389 A message is considered to be a reply or forward if the base 390 subject extraction rules, applied to the original subject, 391 remove any of the following: a subj-refwd, a "(fwd)" 392 subj-trailer, or a subj-fwd-hdr and subj-fwd-trl. 394 The REFERENCES algorithm is significantly more complex than 395 ORDEREDSUBJECT and consists of six main steps. These steps are 396 outlined in detail below. 398 (1) For each searched message: 400 (A) Using the Message IDs in the message's references, link 401 the corresponding messages (those whose Message-ID header 402 line contains the given reference Message ID) together as 403 parent/child. Make the first reference the parent of the 404 second (and the second a child of the first), the second the 405 parent of the third (and the third a child of the second), 406 etc. The following rules govern the creation of these 407 links: 409 If a message does not contain a Message-ID header line, 410 or the Message-ID header line does not contain a valid 411 Message ID, then assign a unique Message ID to this 412 message. 414 If two or more messages have the same Message ID, then 415 only use that Message ID in the first (lowest sequence 416 number) message, and assign a unique Message ID to each 417 of the subsequent messages with a duplicate of that 418 Message ID. 420 If no message can be found with a given Message ID, 421 create a dummy message with this ID. Use this dummy 422 message for all subsequent references to this ID. 424 If a message already has a parent, don't change the 425 existing link. This is done because the References 426 header line may have been truncated by a MUA. As a 427 result, there is no guarantee that the messages 428 corresponding to adjacent Message IDs in the References 429 header line are parent and child. 431 Do not create a parent/child link if creating that link 432 would introduce a loop. For example, before making 433 message A the parent of B, make sure that A is not a 434 descendent of B. 436 Note: Message ID comparisons are case-sensitive. 438 (B) Create a parent/child link between the last reference 439 (or NIL if there are no references) and the current message. 440 If the current message already has a parent, it is probably 441 the result of a truncated References header line, so break 442 the current parent/child link before creating the new 443 correct one. As in step 1.A, do not create the parent/child 444 link if creating that link would introduce a loop. Note 445 that if this message has no references, that it will now 446 have no parent. 448 Note: The parent/child links created in steps 1.A and 1.B 449 MUST be kept consistent with one another at ALL times. 451 (2) Gather together all of the messages that have no parents 452 and make them all children (siblings of one another) of a dummy 453 parent (the "root"). These messages constitute the first 454 (head) message of the threads created thus far. 456 (3) Prune dummy messages from the thread tree. Traverse each 457 thread under the root, and for each message: 459 If it is a dummy message with NO children, delete it. 461 If it is a dummy message with children, delete it, but 462 promote its children to the current level. In other words, 463 splice them in with the dummy's siblings. 465 Do not promote the children if doing so would make them 466 children of the root, unless there is only one child. 468 (4) Sort the messages under the root (top-level siblings only) 469 by sent date. In the case of an exact match on sent date, use 470 the order in which the messages appear in the mailbox (that is, 471 by sequence number) to determine the order. In the case of a 472 dummy message, sort its children by sent date and then use the 473 first child for the top-level sort. If the sent date can not 474 be determined (a Date: header is missing or can not be parsed), 475 the INTERNALDATE for that message is used as the sent date. 477 (5) Gather together messages under the root that have the same 478 base subject text. 480 (A) Create a table for associating base subjects with 481 messages, called the subject table. 483 (B) Populate the subject table with one message per each 484 base subject. For each child of the root: 486 (i) Find the subject of this thread, by using the base 487 subject from either the current message or its first 488 child if the current message is a dummy. This is the 489 thread subject. 491 (ii) If the thread subject is empty, skip this message. 493 (iii) Look up the message associated with the thread 494 subject in the subject table. 496 (iv) If there is no message in the subject table with the 497 thread subject, add the current message and the thread 498 subject to the subject table. 500 Otherwise, if the message in the subject table is not a 501 dummy, AND either of the following criteria are true: 503 The current message is a dummy, OR 505 The message in the subject table is a reply or forward 506 and the current message is not. 508 then replace the message in the subject table with the 509 current message. 511 (C) Merge threads with the same thread subject. For each 512 child of the root: 514 (i) Find the message's thread subject as in step 5.B.i 515 above. 517 (ii) If the thread subject is empty, skip this message. 519 (iii) Lookup the message associated with this thread 520 subject in the subject table. 522 (iv) If the message in the subject table is the current 523 message, skip this message. 525 Otherwise, merge the current message with the one in the 526 subject table using the following rules: 528 If both messages are dummies, append the current 529 message's children to the children of the message in 530 the subject table (the children of both messages 531 become siblings), and then delete the current message. 533 If the message in the subject table is a dummy and the 534 current message is not, make the current message a 535 child of the message in the subject table (a sibling 536 of its children). 538 If the current message is a reply or forward and the 539 message in the subject table is not, make the current 540 message a child of the message in the subject table (a 541 sibling of its children). 543 Otherwise, create a new dummy message and make both 544 the current message and the message in the subject 545 table children of the dummy. Then replace the message 546 in the subject table with the dummy message. 548 Note: Subject comparisons are case-insensitive, as 549 described under "Internationalization 550 Considerations." 552 (6) Traverse the messages under the root and sort each set of 553 siblings by sent date. Traverse the messages in such a way 554 that the "youngest" set of siblings are sorted first, and the 555 "oldest" set of siblings are sorted last (grandchildren are 556 sorted before children, etc). In the case of an exact match on 557 sent date or if either of the Date: headers used in a 558 comparison can not be parsed, use the order in which the 559 messages appear in the mailbox (that is, by sequence number) to 560 determine the order. In the case of a dummy message (which can 561 only occur with top-level siblings), use its first child for 562 sorting. 564 Example: C: A283 THREAD ORDEREDSUBJECT UTF-8 SINCE 5-MAR-2000 565 S: * THREAD (166)(167)(168)(169)(172)(170)(171) 566 (173)(174 (175)(176)(178)(181)(180))(179)(177 567 (183)(182)(188)(184)(185)(186)(187)(189))(190) 568 (191)(192)(193)(194 195)(196 (197)(198))(199) 569 (200 202)(201)(203)(204)(205)(206 207)(208) 570 S: A283 OK THREAD completed 571 C: A284 THREAD ORDEREDSUBJECT US-ASCII TEXT "gewp" 572 S: * THREAD 573 S: A284 OK THREAD completed 574 C: A285 THREAD REFERENCES UTF-8 SINCE 5-MAR-2000 575 S: * THREAD (166)(167)(168)(169)(172)((170)(179)) 576 (171)(173)((174)(175)(176)(178)(181)(180)) 577 ((177)(183)(182)(188 (184)(189))(185 186)(187)) 578 (190)(191)(192)(193)((194)(195 196))(197 198) 579 (199)(200 202)(201)(203)(204)(205 206 207)(208) 580 S: A285 OK THREAD completed 582 Note: The line breaks in the first and third server 583 responses are for editorial clarity and do not appear in 584 real THREAD responses. 586 4. Additional Responses 588 These responses are extensions to the [IMAP] base protocol. 590 The section headings of these responses are intended to correspond 591 with where they would be located in the main document. 593 BASE.7.2.SORT. SORT Response 595 Data: zero or more numbers 597 The SORT response occurs as a result of a SORT or UID SORT 598 command. The number(s) refer to those messages that match the 599 search criteria. For SORT, these are message sequence numbers; 600 for UID SORT, these are unique identifiers. Each number is 601 delimited by a space. 603 Example: S: * SORT 2 3 6 605 BASE.7.2.THREAD. THREAD Response 607 Data: zero or more threads 609 The THREAD response occurs as a result of a THREAD or UID THREAD 610 command. It contains zero or more threads. A thread consists of 611 a parenthesized list of thread members. 613 Thread members consist of zero or more message numbers, delimited 614 by spaces, indicating successive parent and child. This continues 615 until the thread splits into multiple sub-threads, at which point 616 the thread nests into multiple sub-threads with the first member 617 of each subthread being siblings at this level. There is no limit 618 to the nesting of threads. 620 The messages numbers refer to those messages that match the search 621 criteria. For THREAD, these are message sequence numbers; for UID 622 THREAD, these are unique identifiers. 624 Example: S: * THREAD (2)(3 6 (4 23)(44 7 96)) 626 The first thread consists only of message 2. The second thread 627 consists of the messages 3 (parent) and 6 (child), after which it 628 splits into two subthreads; the first of which contains messages 4 629 (child of 6, sibling of 44) and 23 (child of 4), and the second of 630 which contains messages 44 (child of 6, sibling of 4), 7 (child of 631 44), and 96 (child of 7). Since some later messages are parents 632 of earlier messages, the messages were probably moved from some 633 other mailbox at different times. 635 -- 2 637 -- 3 638 \-- 6 639 |-- 4 640 | \-- 23 641 | 642 \-- 44 643 \-- 7 644 \-- 96 646 Example: S: * THREAD ((3)(5)) 648 In this example, 3 and 5 are siblings of a parent which does not 649 match the search criteria (and/or does not exist in the mailbox); 650 however they are members of the same thread. 652 5. Formal Syntax of SORT and THREAD Commands and Responses 654 The following syntax specification uses the Augmented Backus-Naur 655 Form (ABNF) notation as specified in [ABNF]. It also uses [ABNF] 656 rules defined in [IMAP]. 658 sort = ["UID" SP] "SORT" SP sort-criteria SP search-criteria 660 sort-criteria = "(" sort-criterion *(SP sort-criterion) ")" 662 sort-criterion = ["REVERSE" SP] sort-key 664 sort-key = "ARRIVAL" / "CC" / "DATE" / "FROM" / "SIZE" / 665 "SUBJECT" / "TO" 667 thread = ["UID" SP] "THREAD" SP thread-alg SP search-criteria 669 thread-alg = "ORDEREDSUBJECT" / "REFERENCES" / thread-alg-ext 671 thread-alg-ext = atom 672 ; New algorithms MUST be registered with IANA 674 search-criteria = charset 1*(SP search-key) 676 charset = atom / quoted 677 ; CHARSET values MUST be registered with IANA 679 sort-data = "SORT" *(SP nz-number) 681 thread-data = "THREAD" [SP 1*thread-list] 683 thread-list = "(" (thread-members / thread-nested) ")" 685 thread-members = nz-number *(SP nz-number) [SP thread-nested] 687 thread-nested = 2*thread-list 689 The following syntax describes base subject extraction rules (2)-(6): 691 subject = *subj-leader [subj-middle] *subj-trailer 693 subj-refwd = ("re" / ("fw" ["d"])) *WSP [subj-blob] ":" 695 subj-blob = "[" *BLOBCHAR "]" *WSP 697 subj-fwd = subj-fwd-hdr subject subj-fwd-trl 699 subj-fwd-hdr = "[fwd:" 701 subj-fwd-trl = "]" 703 subj-leader = (*subj-blob subj-refwd) / WSP 705 subj-middle = *subj-blob (subj-base / subj-fwd) 706 ; last subj-blob is subj-base if subj-base would 707 ; otherwise be empty 709 subj-trailer = "(fwd)" / WSP 711 subj-base = NONWSP *(*WSP NONWSP) 712 ; can be a subj-blob 714 BLOBCHAR = %x01-5a / %x5c / %x5e-ff 715 ; any CHAR8 except '[' and ']' 717 NONWSP = %x01-08 / %x0a-1f / %x21-ff 718 ; any CHAR8 other than WSP 720 6. Security Considerations 722 The SORT and THREAD extensions do not raise any security 723 considerations that are not present in the base [IMAP] protocol, and 724 these issues are discussed in [IMAP]. Nevertheless, it is important 725 to remember that [IMAP] protocol transactions, including message 726 data, are sent in the clear over the network unless protection from 727 snooping is negotiated, either by the use of STARTTLS, privacy 728 protection is negotiated in the AUTHENTICATE command, or some other 729 protection mechanism is in effect. 731 7. Internationalization Considerations 733 As described in [IMAP-I18N], strings in charsets other than US-ASCII 734 and UTF-8 MUST be converted to UTF-8 and compared in ascending order 735 according to the selected or active collation algorithm. If the 736 server does not support the [IMAP-I18N] COMPARATOR extension, the 737 collation algorithm used is the "en;ascii-casemap" collation 738 described in [COMPARATOR]. 740 Translations of the "re" or "fw"/"fwd" tokens are not specified for 741 removal in the base subject extraction process. An attempt to add 742 such translated tokens would result in a geometrically complex, and 743 ultimately unimplementable, task. 745 Instead, note that [RFC-2822] section 3.6.5 recommends that "re:" 746 (from the Latin "res", in the matter of) be used to identify a reply. 747 Although it is evident that, from the multiple forms of token to 748 identify a forwarded message, there is considerable variation found 749 in the wild, the variations are (still) manageable. Consequently, it 750 is suggested that "re:" and one of the variations of the tokens for 751 forward supported by the base subject extraction rules be adopted for 752 Internet mail messages, since doing so makes it a simple display time 753 task to localize the token language for the user. 755 8. IANA Considerations 757 [IMAP] capabilities are registered by publishing a standards track or 758 IESG approved experimental RFC. This document constitutes 759 registration of the SORT and THREAD capabilities in the [IMAP] 760 capabilities registry. 762 This document creates a new [IMAP] threading algorithms registry, 763 which registers threading algorithms by publishing a standards track 764 or IESG approved experimental RFC. This document constitutes 765 registration of the ORDEREDSUBJECT and REFERENCES algorithms in that 766 registry. 768 9. Normative References 770 The following documents are normative to this document: 772 [ABNF] Crocker, D. and Overell, P. "Augmented BNF 773 for Syntax Specifications: ABNF", RFC 4234 774 October 2005. 776 [CHARSET] Freed, N. and J. Postel, "IANA Character Set 777 Registration Procedures", RFC 2978, October 778 2000. 780 [COMPARATOR] Newman, C. "Internet Appplication Protocol 781 Collation Registry", Work in Progress. 783 [IMAP] Crispin, M. "Internet Message Access Protocol - 784 Version 4rev1", RFC 3501, March 2003. 786 [IMAP-I18N] Newman, C. "Internet Message Access Protocol 787 Internationalization", Work in Progress. 789 [KEYWORDS] Bradner, S. "Key words for use in RFCs to 790 Indicate Requirement Levels", BCP 14, RFC 2119, 791 March 1997. 793 [RFC-2822] Resnick, P. "Internet Message Format", RFC 794 2822, April 2001. 796 10. Informative References 798 The following documents are informative to this document: 800 [IMAP-MODELS] Crispin, M. "Distributed Electronic Mail Models 801 in IMAP4", RFC 1733, December 1994. 803 [THREADING] Zawinski, J. "Message Threading", 804 http://www.jwz.org/doc/threading.html, 805 1997-2002. 807 Appendices 809 Author's Address 811 Mark R. Crispin 812 Networks and Distributed Computing 813 University of Washington 814 4545 15th Avenue NE 815 Seattle, WA 98105-4527 817 Phone: +1 (206) 543-5762 819 EMail: MRC@CAC.Washington.EDU 821 Kenneth Murchison 822 Carnegie Mellon University 823 5000 Forbes Avenue 824 Cyert Hall 285 825 Pittsburgh, PA 15213 827 Phone: +1 (412) 268-2638 828 Email: murch@andrew.cmu.edu 830 Full Copyright Statement 832 Copyright (C) The Internet Society (2006). 834 This document is subject to the rights, licenses and restrictions 835 contained in BCP 78, and except as set forth therein, the authors 836 retain all their rights. 838 This document and the information contained herein are provided on an 839 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 840 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 841 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 842 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 843 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 844 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 846 Intellectual Property 848 The IETF takes no position regarding the validity or scope of any 849 Intellectual Property Rights or other rights that might be claimed to 850 pertain to the implementation or use of the technology described in 851 this document or the extent to which any license under such rights 852 might or might not be available; nor does it represent that it has 853 made any independent effort to identify any such rights. Information 854 on the procedures with respect to rights in RFC documents can be 855 found in BCP 78 and BCP 79. 857 Copies of IPR disclosures made to the IETF Secretariat and any 858 assurances of licenses to be made available, or the result of an 859 attempt made to obtain a general license or permission for the use of 860 such proprietary rights by implementers or users of this 861 specification can be obtained from the IETF on-line IPR repository at 862 http://www.ietf.org/ipr. 864 The IETF invites any interested party to bring to its attention any 865 copyrights, patents or patent applications, or other proprietary 866 rights that may cover technology that may be required to implement 867 this standard. Please address the information to the IETF at ietf- 868 ipr@ietf.org. 870 Acknowledgement 872 Funding for the RFC Editor function is currently provided by the 873 Internet Society.