idnits 2.17.1 

draft-ietf-imapext-sort-17.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3667, Section 5.1 on line 822.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 858.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 869.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 876.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 882.

  ** Found boilerplate matching RFC 3979, Section 5, paragraph 1 (on line
     869), which is fine, but *also* found old RFC 2026, Section 10.4A text on
     line 838.

  ** Found boilerplate matching RFC 3979, Section 5, paragraph 3 (on line
     882), which is fine, but *also* found old RFC 2026, Section 10.4B text on
     line 844.

  ** The document claims conformance with section 10 of RFC 2026, but uses
     some RFC 3978/3979 boilerplate.  As RFC 3978/3979 replaces section 10 of
     RFC 2026, you should not claim conformance with it if you have changed to
     using RFC 3978/3979 boilerplate.

  ** The document seems to lack an RFC 3978 Section 5.1 IPR Disclosure
     Acknowledgement -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** The document seems to lack an RFC 3978 Section 5.4 Reference to BCP 78
     -- however, there's a paragraph with a matching beginning. Boilerplate
     error?

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.

  ** The document uses RFC 3667 boilerplate or RFC 3978-like boilerplate
     instead of verbatim RFC 3978 boilerplate.  After 6 May 2005, submission
     of drafts without verbatim RFC 3978 boilerplate is not accepted.

     The following non-3978 patterns matched text found in the document. 
     That text should be removed or replaced:

        By submitting this Internet-Draft, I certify that any applicable patent
        or other IPR claims of which I am aware have been disclosed, or
        will be disclosed, and any of which I become aware will be
        disclosed, in accordance with RFC 3668.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 1
     longer page, the longest (page 1) being 917 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There are 92 instances of too long lines in the document, the longest
     one being 6 characters in excess of 72.

  ** The abstract seems to contain references ([IMAP]), which it shouldn't. 
     Please replace those with straight textual mentions of the documents in
     question.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords -- however, there's a paragraph with a matching beginning.
     Boilerplate error?

     RFC 2119 keyword, line 55: '...e SORT extension SHOULD also implement...'
     RFC 2119 keyword, line 117: '...    MUST use exactly this algorithm wh...'
     RFC 2119 keyword, line 128: '...he date and time SHOULD be treated as ...'
     RFC 2119 keyword, line 129: '...nvalid, the time SHOULD be treated as ...'
     RFC 2119 keyword, line 130: '...the date and time SHOULD be treated as...'
     (9 more instances...)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == In addition to RFC 3979, Section 5, paragraph 1 boilerplate, a section
     with a similar start was also found:


        The IETF takes no position regarding the validity or scope of any
        intellectual property or other rights that might be claimed to
        pertain to the implementation or use of the technology described
        in this document or the extent to which any license under such
        rights might or might not be available; neither does it represent
        that it has made any effort to identify any such rights.
        Information on the IETF's procedures with respect to rights in
        standards-track and standards-related documentation can be found
        in BCP-11. Copies of claims of rights made available for
        publication and any assurances of licenses to be made available,
        or the result of an attempt made to obtain a general license or
        permission for the use of such proprietary rights by implementors
        or users of this specification can be obtained from the IETF
        Secretariat.

  == In addition to RFC 3979, Section 5, paragraph 3 boilerplate, a section
     with a similar start was also found:


        The IETF invites any interested party to bring to its attention any
        copyrights, patents or patent applications, or other proprietary
        rights which may cover technology that may be required to
        practice this standard. Please address the information to the
        IETF Executive Director.

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (May 2004) is 7283 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Missing reference section? 'IMAP' on line 773 looks like a reference

  -- Missing reference section? 'IMAP-MODELS' on line 790 looks like a
     reference

  -- Missing reference section? 'IMAP-I18N' on line 776 looks like a reference

  -- Missing reference section? 'KEYWORDS' on line 779 looks like a reference

  -- Missing reference section? 'ABNF' on line 762 looks like a reference

  -- Missing reference section? 'IMAP-MODEL' on line 116 looks like a
     reference

  -- Missing reference section? 'CHARSET' on line 766 looks like a reference

  -- Missing reference section? 'THREADING' on line 793 looks like a reference

  -- Missing reference section? 'RFC 2822' on line 368 looks like a reference

  -- Missing reference section? 'COMPARATOR' on line 770 looks like a
     reference

  -- Missing reference section? 'RFC-2822' on line 783 looks like a reference


     Summary: 15 errors (**), 0 flaws (~~), 5 warnings (==), 18 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	IMAP Extensions Working Group                                 M. Crispin
2	INTERNET-DRAFT: IMAP SORT                                   K. Murchison
3	Document: internet-drafts/draft-ietf-imapext-sort-17.txt        May 2004

5	      INTERNET MESSAGE ACCESS PROTOCOL - SORT AND THREAD EXTENSIONS

7	Status of this Memo

9	    This document is an Internet-Draft and is in full conformance with
10	    all provisions of Section 10 of RFC 2026.

12	    Internet-Drafts are working documents of the Internet Engineering
13	    Task Force (IETF), its areas, and its working groups.  Note that
14	    other groups may also distribute working documents as
15	    Internet-Drafts.

17	    Internet-Drafts are draft documents valid for a maximum of six months
18	    and may be updated, replaced, or obsoleted by other documents at any
19	    time.  It is inappropriate to use Internet-Drafts as reference
20	    material or to cite them other than as "work in progress."

22	    The list of current Internet-Drafts can be accessed at
23	    http://www.ietf.org/ietf/1id-abstracts.txt

25	    To view the list Internet-Draft Shadow Directories, see
26	    http://www.ietf.org/shadow.html.

28	    A revised version of this document will be submitted to the RFC
29	    editor as an Informational Document for the Internet Community.

31	    A revised version of this draft document will be submitted to the RFC
32	    editor as a Proposed Standard for the Internet Community.  Discussion
33	    and suggestions for improvement are requested, and should be sent to
34	    ietf-imapext@IMC.ORG.  This document will expire before 23 November 2004.
35	    Distribution of this memo is unlimited.

37	Abstract

39	    This document describes the base-level server-based sorting and
40	    threading extensions to the [IMAP] protocol.  These extensions
41	    provide substantial performance improvements for IMAP clients which
42	    offer sorted and threaded views.

44	1. Introduction

46	    The SORT and THREAD extensions to the [IMAP] protocol provide a means
47	    of server-based sorting and threading of messages, without requiring
48	    that the client download the necessary data to do so itself.  This is
49	    particularly useful for online clients as described in [IMAP-MODELS].

51	    A server which supports the base-level SORT extension indicates this
52	    with a capability name which starts with "SORT".  Future,
53	    upwards-compatible extensions to the SORT extension will all start
54	    with "SORT", indicating support for this base level.  A server which
55	    implements the SORT extension SHOULD also implement the COMPARATOR
56	    extension as described in [IMAP-I18N].

58	    A server which supports the THREAD extension indicates this with one
59	    or more capability names consisting of "THREAD=" followed by a
60	    supported threading algorithm name as described in this document.
61	    This provides for future upwards-compatible extensions.

63	2. Terminology

65	    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
66	    "SHOULD", "SHOULD NOT", "MAY", and "OPTIONAL" in this document are to
67	    be interpreted as described in [KEYWORDS].

69	    The word "can" (not "may") is used to refer to a possible
70	    circumstance or situation, as opposed to an optional facility of the
71	    protocol.

73	    "User" is used to refer to a human user, whereas "client" refers to
74	    the software being run by the user.

76	    In examples, "C:" and "S:" indicate lines sent by the client and
77	    server respectively.

79	2.1 Base Subject

81	    Subject sorting and threading use the "base subject," which has
82	    specific subject artifacts removed.  Due to the complexity of these
83	    artifacts, the formal syntax for the subject extraction rules is
84	    ambiguous.  The following procedure is followed to determine the
85	    actual "base subject" which is used to sort by subject, using the
86	    [ABNF] formal syntax rules described in section 5:

88	         (1) Convert any RFC 2047 encoded-words in the subject to
89	         UTF-8 as described in "internationalization
90	         considerations."  Convert all tabs and continuations to
91	         space.  Convert all multiple spaces to a single space.

93	         (2) Remove all trailing text of the subject that matches
94	         the subj-trailer ABNF, repeat until no more matches are
95	         possible.

97	         (3) Remove all prefix text of the subject that matches the
98	         subj-leader ABNF.

100	         (4) If there is prefix text of the subject that matches the
101	         subj-blob ABNF, and removing that prefix leaves a non-empty
102	         subj-base, then remove the prefix text.

104	         (5) Repeat (3) and (4) until no matches remain.

106	    Note: it is possible to defer step (2) until step (6), but this
107	    requires checking for subj-trailer in step (4).

109	         (6) If the resulting text begins with the subj-fwd-hdr ABNF
110	         and ends with the subj-fwd-trl ABNF, remove the
111	         subj-fwd-hdr and subj-fwd-trl and repeat from step (2).

113	         (7) The resulting text is the "base subject" used in the
114	         SORT.

116	    All servers and disconnected (as described in [IMAP-MODEL]) clients
117	    MUST use exactly this algorithm when sorting by subject.  Otherwise
118	    there is potential for a user to get inconsistent results based on
119	    whether they are running in connected or disconnected mode.

121	2.2 Sent Date

123	    As used in this document, the term "sent date" refers to the date and
124	    time from the Date: header, adjusted by time zone to normalize to UTC.
125	    For example, "31 Dec 2000 16:01:33 -0800" is equivalent to the UTC
126	    date and time of "1 Jan 2001 00:01:33 +0000".

128	    If the time zone is invalid, the date and time SHOULD be treated as UTC.
129	    If the time is also invalid, the time SHOULD be treated as 00:00:00.  If
130	    there is no valid date or time, the date and time SHOULD be treated as
131	    00:00:00 on the earliest possible date.

133	    This differs from the date-related criteria in the SEARCH command
134	    (described in [IMAP] section 6.4.4), which use just the date and not
135	    the time, and are not adjusted by time zone.

137	3. Additional Commands

139	    These commands are extension to the [IMAP] base protocol.

141	    The section headings are intended to correspond with where they would
142	    be located in the main document if they were part of the base
143	    specification.

145	BASE.6.4.SORT. SORT Command

147	    Arguments:  sort program
148	                charset specification
149	                searching criteria (one or more)

151	    Data:       untagged responses: SORT

153	    Result:     OK - sort completed
154	                NO - sort error: can't sort that charset or
155	                     criteria
156	                BAD - command unknown or arguments invalid

158	       The SORT command is a variant of SEARCH with sorting semantics for
159	       the results.  Sort has two arguments before the searching criteria
160	       argument; a parenthesized list of sort criteria, and the searching
161	       charset.

163	       The charset argument is mandatory (unlike SEARCH) and indicates
164	       the [CHARSET] of the strings that appear in the searching criteria.
165	       The US-ASCII and UTF-8 charsets MUST be implemented.  All other
166	       charsets are optional.

168	       There is also a UID SORT command which returns unique identifiers
169	       instead of message sequence numbers.  Note that there are separate
170	       searching criteria for message sequence numbers and UIDs; thus the
171	       arguments to UID SORT are interpreted the same as in SORT.  This is
172	       analogous to the behavior of UID SEARCH, as opposed to UID COPY, UID
173	       FETCH, or UID STORE.

175	       The SORT command first searches the mailbox for messages that
176	       match the given searching criteria using the charset argument for
177	       the interpretation of strings in the searching criteria.  It then
178	       returns the matching messages in an untagged SORT response, sorted
179	       according to one or more sort criteria.

181	       Sorting is in ascending order.  Earlier dates sort before later
182	       dates; smaller sizes sort before larger sizes; and strings are
183	       sorted according to ascending values established by their
184	       collation algorithm (see under "Internationalization
185	       Considerations").

187	       If two or more messages exactly match according to the sorting
188	       criteria, these messages are sorted according to the order in
189	       which they appear in the mailbox.  In other words, there is an
190	       implicit sort criterion of "sequence number".

192	       When multiple sort criteria are specified, the result is sorted in
193	       the priority order that the criteria appear.  For example,
194	       (SUBJECT DATE) will sort messages in order by their base subject
195	       text; and for messages with the same base subject text will sort
196	       by their sent date.

198	       Untagged EXPUNGE responses are not permitted while the server is
199	       responding to a SORT command, but are permitted during a UID SORT
200	       command.

202	       The defined sort criteria are as follows.  Refer to the Formal
203	       Syntax section for the precise syntactic definitions of the
204	       arguments.  If the associated RFC-822 header for a particular
205	       criterion is absent, it is treated as the empty string.  The empty
206	       string always collates before non-empty strings.

208	       ARRIVAL
209	          Internal date and time of the message.  This differs from the
210	          ON criteria in SEARCH, which uses just the internal date.

212	       CC
213	          RFC-822 local-part of the first "cc" address.

215	       DATE
216	          Sent date and time from the Date: header, adjusted by time
217	          zone.  This differs from the SENTON criteria in SEARCH, which
218	          uses just the date and not the time, nor adjusts by time zone.

220	       FROM
221	          RFC-822 local-part of the first "From" address.

223	       REVERSE
224	          Followed by another sort criterion, has the effect of that
225	          criterion but in reverse (descending) order.
226	             Note: REVERSE only reverses a single criterion, and does not
227	             affect the implicit "sequence number" sort criterion if all
228	             other criteria are identicial.  Consequently, a sort of
229	             REVERSE SUBJECT is not the same as a reverse ordering of a
230	             SUBJECT sort.  This can be avoided by use of additional
231	             criteria, e.g. SUBJECT DATE vs. REVERSE SUBJECT REVERSE
232	             DATE.  In general, however, it's better (and faster, if the
233	             client has a "reverse current ordering" command) to reverse
234	             the results in the client instead of issuing a new SORT.

236	       SIZE
237	          Size of the message in octets.

239	       SUBJECT
240	          Base subject text.

242	       TO
243	          RFC-822 local-part of the first "To" address.

245	    Example:    C: A282 SORT (SUBJECT) UTF-8 SINCE 1-Feb-1994
246	                S: * SORT 2 84 882
247	                S: A282 OK SORT completed
248	                C: A283 SORT (SUBJECT REVERSE DATE) UTF-8 ALL
249	                S: * SORT 5 3 4 1 2
250	                S: A283 OK SORT completed
251	                C: A284 SORT (SUBJECT) US-ASCII TEXT "not in mailbox"
252	                S: * SORT
253	                S: A284 OK SORT completed

255	BASE.6.4.THREAD. THREAD Command

257	Arguments:  threading algorithm
258	             charset specification
259	             searching criteria (one or more)

261	Data:       untagged responses: THREAD

263	Result:     OK - thread completed
264	             NO - thread error: can't thread that charset or
265	                  criteria
266	             BAD - command unknown or arguments invalid

268	       The THREAD command is a variant of SEARCH with threading semantics
269	       for the results.  Thread has two arguments before the searching
270	       criteria argument; a threading algorithm, and the searching
271	       charset.

273	       The charset argument is mandatory (unlike SEARCH) and indicates
274	       the [CHARSET] of the strings that appear in the searching criteria.
275	       The US-ASCII and UTF-8 charsets MUST be implemented.  All other
276	       charsets are optional.

278	       There is also a UID THREAD command which returns unique identifiers
279	       instead of message sequence numbers.  Note that there are separate
280	       searching criteria for message sequence numbers and UIDs; thus the
281	       arguments to UID THREAD are interpreted the same as in THREAD.  This is
282	       analogous to the behavior of UID SEARCH, as opposed to UID COPY, UID
283	       FETCH, or UID STORE.

285	       The THREAD command first searches the mailbox for messages that
286	       match the given searching criteria using the charset argument for
287	       the interpretation of strings in the searching criteria.  It then
288	       returns the matching messages in an untagged THREAD response,
289	       threaded according to the specified threading algorithm.

291	       All collation is in ascending order.  Earlier dates collate before
292	       later dates and strings are collated according to ascending values
293	       established by their collation algorithm (see under
294	       "Internationalization Considerations").

296	       The defined threading algorithms are as follows:

298	       ORDEREDSUBJECT

300	          The ORDEREDSUBJECT threading algorithm is also referred to as
301	          "poor man's threading."  The searched messages are sorted by
302	          base subject and then by the sent date.  The messages are then
303	          split into separate threads, with each thread containing
304	          messages with the same base subject text.  Finally, the threads
305	          are sorted by the sent date of the first message in the thread.

307	          The first message of each thread are siblings of each other
308	          (the "root").  The second message of a thread is the child of
309	          the first message, and subsequent messages of the thread are
310	          siblings of the second message and hence children of the
311	          message at the root.  Hence, there are no grandchildren in
312	          ORDEREDSUBJECT threading.

314	            Note: early drafts of this specification specified
315	            that each message in an ORDEREDSUBJECT thread is a child
316	            (as opposed to a sibling) of the previous message.  This
317	            is now deprecated.  For compatibility with servers which
318	            may still use the old definition, client implementations
319	            SHOULD treat descendents of a child as being siblings of
320	            that child.

322	            This is because the old definition mistakenly indicated
323	            that there was a parent/child relationship between
324	            successive messages in a thread; when in fact there was
325	            only a chronological relationship.  In clients which
326	            indicate parent/child relationships in a thread tree,
327	            this would indicate levels of descent which did not
328	            exist.

330	       REFERENCES

332	          The REFERENCES threading algorithm is based on the [THREADING]
333	          algorithm written used in "Netscape Mail and News" versions 2.0
334	          through 3.0.  This algorithm threads the searched messages by
335	          grouping them together in parent/child relationships based on
336	          which messages are replies to others.  The parent/child
337	          relationships are built using two methods: reconstructing a
338	          message's ancestry using the references contained within it;
339	          and checking the original (not base) subject of a message to
340	          see if it is a reply to (or forward of) another message.

342	             Note: "Message ID" in the following description refers to a
343	             normalized form of the msg-id in [RFC 2822].  The actual
344	             text in an RFC 2822 may use quoting, resulting in multiple
345	             ways of expressing the same Message ID.  Implementations of
346	             the REFERENCES threading algorithm MUST normalize any msg-id
347	             in order to avoid false non-matches due to differences in
348	             quoting.

350	             For example, the msg-id
351	                <"01KF8JCEOCBS0045PS"@xxx.yyy.com>
352	             and the msg-id
353	                <01KF8JCEOCBS0045PS@xxx.yyy.com>
354	             MUST be interpreted as being the same Message ID.

356	          The references used for reconstructing a message's ancestry are
357	          found using the following rules:

359	             If a message contains a References header line, then use the
360	             Message IDs in the References header line as the references.

362	             If a message does not contain a References header line, or
363	             the References header line does not contain any valid
364	             Message IDs, then use the first (if any) valid Message ID
365	             found in the In-Reply-To header line as the only reference
366	             (parent) for this message.

368	                Note: Although [RFC 2822] permits multiple Message IDs in
369	                the In-Reply-To header, in actual practice this
370	                discipline has not been followed.  For example,
371	                In-Reply-To headers have been observed with message
372	                addresses after the Message ID, and there are no good
373	                heuristics for software to determine the difference.
374	                This is not a problem with the References header however.

376	             If a message does not contain an In-Reply-To header line, or
377	             the In-Reply-To header line does not contain a valid Message
378	             ID, then the message does not have any references (NIL).

380	          A message is considered to be a reply or forward if the base
381	          subject extraction rules, applied to the original subject,
382	          remove any of the following: a subj-refwd, a "(fwd)"
383	          subj-trailer, or a subj-fwd-hdr and subj-fwd-trl.

385	          The REFERENCES algorithm is significantly more complex than
386	          ORDEREDSUBJECT and consists of six main steps.  These steps are
387	          outlined in detail below.

389	          (1) For each searched message:

391	             (A) Using the Message IDs in the message's references, link
392	             the corresponding messages (those whose Message-ID header
393	             line contains the given reference Message ID) together as
394	             parent/child.  Make the first reference the parent of the
395	             second (and the second a child of the first), the second the
396	             parent of the third (and the third a child of the second),
397	             etc.  The following rules govern the creation of these
398	             links:

400	                If a message does not contain a Message-ID header line,
401	                or the Message-ID header line does not contain a valid
402	                Message ID, then assign a unique Message ID to this
403	                message.

405	                If two or more messages have the same Message ID, then
406	                only use that Message ID in the first (lowest sequence
407	                number) message, and assign a unique Message ID to each
408	                of the subsequent messages with a duplicate of that
409	                Message ID.

411	                If no message can be found with a given Message ID,
412	                create a dummy message with this ID.  Use this dummy
413	                message for all subsequent references to this ID.

415	                If a message already has a parent, don't change the
416	                existing link.  This is done because the References
417	                header line may have been truncated by a MUA.  As a
418	                result, there is no guarantee that the messages
419	                corresponding to adjacent Message IDs in the References
420	                header line are parent and child.

422	                Do not create a parent/child link if creating that link
423	                would introduce a loop.  For example, before making
424	                message A the parent of B, make sure that A is not a
425	                descendent of B.

427	                   Note: Message ID comparisons are case-sensitive.

429	             (B) Create a parent/child link between the last reference
430	             (or NIL if there are no references) and the current message.
431	             If the current message already has a parent, it is probably
432	             the result of a truncated References header line, so break
433	             the current parent/child link before creating the new
434	             correct one.  As in step 1.A, do not create the parent/child
435	             link if creating that link would introduce a loop.  Note
436	             that if this message has no references, that it will now
437	             have no parent.

439	                Note: The parent/child links created in steps 1.A and 1.B
440	                MUST be kept consistent with one another at ALL times.

442	          (2) Gather together all of the messages that have no parents
443	          and make them all children (siblings of one another) of a dummy
444	          parent (the "root").  These messages constitute the first
445	          (head) message of the threads created thus far.

447	          (3) Prune dummy messages from the thread tree.  Traverse each
448	          thread under the root, and for each message:

450	             If it is a dummy message with NO children, delete it.

452	             If it is a dummy message with children, delete it, but
453	             promote its children to the current level.  In other words,
454	             splice them in with the dummy's siblings.

456	             Do not promote the children if doing so would make them
457	             children of the root, unless there is only one child.

459	          (4) Sort the messages under the root (top-level siblings only)
460	          by sent date.  In the case of an exact match on sent date or if
461	          either of the Date: headers used in a comparison can not be
462	          parsed, use the order in which the messages appear in the
463	          mailbox (that is, by sequence number) to determine the order.
464	          In the case of a dummy message, sort its children by sent date
465	          and then use the first child for the top-level sort.

467	          (5) Gather together messages under the root that have the same
468	          base subject text.

470	             (A) Create a table for associating base subjects with
471	             messages, called the subject table.

473	             (B) Populate the subject table with one message per each
474	             base subject.  For each child of the root:

476	                (i) Find the subject of this thread, by using the base
477	                subject from either the current message or its first
478	                child if the current message is a dummy.  This is the
479	                thread subject.

481	                (ii) If the thread subject is empty, skip this message.

483	                (iii) Look up the message associated with the thread
484	                subject in the subject table.

486	                (iv) If there is no message in the subject table with the
487	                thread subject, add the current message and the thread
488	                subject to the subject table.

490	                Otherwise, if the message in the subject table is not a
491	                dummy, AND either of the following criteria are true:

493	                   The current message is a dummy, OR

495	                   The message in the subject table is a reply or forward
496	                   and the current message is not.

498	             then replace the message in the subject table with the
499	             current message.

501	             (C) Merge threads with the same thread subject.  For each
502	             child of the root:

504	                (i) Find the message's thread subject as in step 5.B.i
505	                above.

507	                (ii) If the thread subject is empty, skip this message.

509	                (iii) Lookup the message associated with this thread
510	                subject in the subject table.

512	                (iv) If the message in the subject table is the current
513	                message, skip this message.

515	                Otherwise, merge the current message with the one in the
516	                subject table using the following rules:

518	                   If both messages are dummies, append the current
519	                   message's children to the children of the message in
520	                   the subject table (the children of both messages
521	                   become siblings), and then delete the current message.

523	                   If the message in the subject table is a dummy and the
524	                   current message is not, make the current message a
525	                   child of the message in the subject table (a sibling
526	                   of its children).

528	                   If the current message is a reply or forward and the
529	                   message in the subject table is not, make the current
530	                   message a child of the message in the subject table (a
531	                   sibling of its children).

533	                   Otherwise, create a new dummy message and make both
534	                   the current message and the message in the subject
535	                   table children of the dummy.  Then replace the message
536	                   in the subject table with the dummy message.

538	                      Note: Subject comparisons are case-insensitive, as
539	                      described under "Internationalization
540	                      Considerations."

542	          (6) Traverse the messages under the root and sort each set of
543	          siblings by sent date.  Traverse the messages in such a way
544	          that the "youngest" set of siblings are sorted first, and the
545	          "oldest" set of siblings are sorted last (grandchildren are
546	          sorted before children, etc).  In the case of an exact match on
547	          sent date or if either of the Date: headers used in a
548	          comparison can not be parsed, use the order in which the
549	          messages appear in the mailbox (that is, by sequence number) to
550	          determine the order.  In the case of a dummy message (which can
551	          only occur with top-level siblings), use its first child for
552	          sorting.

554	    Example:    C: A283 THREAD ORDEREDSUBJECT UTF-8 SINCE 5-MAR-2000
555	                S: * THREAD (166)(167)(168)(169)(172)(170)(171)
556	                   (173)(174 (175)(176)(178)(181)(180))(179)(177
557	                   (183)(182)(188)(184)(185)(186)(187)(189))(190)
558	                   (191)(192)(193)(194 195)(196 (197)(198))(199)
559	                   (200 202)(201)(203)(204)(205)(206 207)(208)
560	                S: A283 OK THREAD completed
561	                C: A284 THREAD ORDEREDSUBJECT US-ASCII TEXT "gewp"
562	                S: * THREAD
563	                S: A284 OK THREAD completed
564	                C: A285 THREAD REFERENCES UTF-8 SINCE 5-MAR-2000
565	                S: * THREAD (166)(167)(168)(169)(172)((170)(179))
566	                   (171)(173)((174)(175)(176)(178)(181)(180))
567	                   ((177)(183)(182)(188 (184)(189))(185 186)(187))
568	                   (190)(191)(192)(193)((194)(195 196))(197 198)
569	                   (199)(200 202)(201)(203)(204)(205 206 207)(208)
570	                S: A285 OK THREAD completed

572	         Note: The line breaks in the first and third client
573	         responses are for editorial clarity and do not appear in
574	         real THREAD responses.

576	4. Additional Responses

578	    These responses are extensions to the [IMAP] base protocol.

580	    The section headings of these responses are intended to correspond
581	    with where they would be located in the main document.

583	BASE.7.2.SORT. SORT Response

585	    Data:       zero or more numbers

587	       The SORT response occurs as a result of a SORT or UID SORT
588	       command.  The number(s) refer to those messages that match the
589	       search criteria.  For SORT, these are message sequence numbers;
590	       for UID SORT, these are unique identifiers.  Each number is
591	       delimited by a space.

593	    Example:    S: * SORT 2 3 6

595	BASE.7.2.THREAD. THREAD Response

597	    Data:       zero or more threads

599	       The THREAD response occurs as a result of a THREAD or UID THREAD
600	       command.  It contains zero or more threads.  A thread consists of
601	       a parenthesized list of thread members.

603	       Thread members consist of zero or more message numbers, delimited
604	       by spaces, indicating successive parent and child.  This continues
605	       until the thread splits into multiple sub-threads, at which point
606	       the thread nests into multiple sub-threads with the first member
607	       of each subthread being siblings at this level.  There is no limit
608	       to the nesting of threads.

610	       The messages numbers refer to those messages that match the search
611	       criteria.  For THREAD, these are message sequence numbers; for UID
612	       THREAD, these are unique identifiers.

614	    Example:    S: * THREAD (2)(3 6 (4 23)(44 7 96))

616	       The first thread consists only of message 2.  The second thread
617	       consists of the messages 3 (parent) and 6 (child), after which it
618	       splits into two subthreads; the first of which contains messages 4
619	       (child of 6, sibling of 44) and 23 (child of 4), and the second of
620	       which contains messages 44 (child of 6, sibling of 4), 7 (child of
621	       44), and 96 (child of 7).  Since some later messages are parents
622	       of earlier messages, the messages were probably moved from some
623	       other mailbox at different times.

625	       -- 2

627	       -- 3
628	          \-- 6
629	              |-- 4
630	              |   \-- 23
631	              |
632	              \-- 44
633	                   \-- 7
634	                       \-- 96

636	    Example:    S: * THREAD ((3)(5))

638	       In this example, 3 and 5 are siblings of a parent which does not
639	       match the search criteria (and/or does not exist in the mailbox);
640	       however they are members of the same thread.

642	5. Formal Syntax of SORT and THREAD Commands and Responses

644	    The following syntax specification uses the Augmented Backus-Naur
645	    Form (ABNF) notation as specified in [ABNF].  It also uses [ABNF]
646	    rules defined in [IMAP].

648	sort            = ["UID" SP] "SORT" SP sort-criteria SP search-criteria

650	sort-criteria   = "(" sort-criterion *(SP sort-criterion) ")"

652	sort-criterion  = ["REVERSE" SP] sort-key

654	sort-key        = "ARRIVAL" / "CC" / "DATE" / "FROM" / "SIZE" /
655	                   "SUBJECT" / "TO"

657	thread          = ["UID" SP] "THREAD" SP thread-alg SP search-criteria

659	thread-alg      = "ORDEREDSUBJECT" / "REFERENCES" / thread-alg-ext

661	thread-alg-ext  = atom
662	                     ; New algorithms MUST be registered with IANA

664	search-criteria = charset 1*(SP search-key)

666	charset         = astring
667	                     ; CHARSET values MUST be registered with IANA

669	sort-data       = "SORT" *(SP nz-number)

671	thread-data     = "THREAD" [SP 1*thread-list]

673	thread-list     = "(" (thread-members / thread-nested) ")"

675	thread-members  = nz-number *(SP nz-number) [SP thread-nested]

677	thread-nested   = 2*thread-list

679	    The following syntax describes base subject extraction rules (2)-(6):

681	subject         = *subj-leader [subj-middle] *subj-trailer

683	subj-refwd      = ("re" / ("fw" ["d"])) *WSP [subj-blob] ":"

685	subj-blob       = "[" *BLOBCHAR "]" *WSP

687	subj-fwd        = subj-fwd-hdr subject subj-fwd-trl

689	subj-fwd-hdr    = "[fwd:"

691	subj-fwd-trl    = "]"

693	subj-leader     = (*subj-blob subj-refwd) / WSP

695	subj-middle     = *subj-blob (subj-base / subj-fwd)
696	                     ; last subj-blob is subj-base if subj-base would
697	                     ; otherwise be empty

699	subj-trailer    = "(fwd)" / WSP

701	subj-base       = NONWSP *(*WSP NONWSP)
702	                     ; can be a subj-blob

704	BLOBCHAR        = %x01-5a / %x5c / %x5e-7f
705	                     ; any CHAR except '[' and ']'

707	NONWSP          = %x01-08 / %x0a-1f / %x21-7f
708	                     ; any CHAR other than WSP

710	6. Security Considerations

712	    The SORT and THREAD extensions do not raise any security
713	    considerations that are not present in the base [IMAP] protocol, and
714	    these issues are discussed in [IMAP].  Nevertheless, it is important
715	    to remember that [IMAP] protocol transactions, including message
716	    data, are sent in the clear over the network unless protection from
717	    snooping is negotiated, either by the use of STARTTLS, privacy
718	    protection is negotiated in the AUTHENTICATE command, or some other
719	    protection mechanism is in effect.

721	7. Internationalization Considerations

723	    As described in [IMAP-I18N], strings in charsets other than US-ASCII
724	    and UTF-8 MUST be converted to UTF-8 and compared in ascending order
725	    according to the selected or active collation algorithm.  If the server
726	    does not support the [IMAP-I18N] COMPARATOR extension, the collation
727	    algorithm used is the "en;ascii-casemap" collation described in
728	    [COMPARATOR].

730	    Translations of the "re" or "fw"/"fwd" tokens are not specified for
731	    removal in the base subject extraction process.  An attempt to add such
732	    translated tokens would result in a geometrically complex, and
733	    ultimately unimplementable, task.

735	    Instead, note that [RFC-2822] section 3.6.5 recommends that "re:" (from
736	    the Latin "res", in the matter of) be used to identify a reply.
737	    Although it is evident that, from the multiple forms of token to
738	    identify a forwarded message, there is considerable variation found in
739	    the wild, the variations are (still) manageable.  Consequently, it is
740	    suggested that "re:" and one of the variations of the tokens for forward
741	    supported by the base subject extraction rules be adopted for Internet
742	    mail messages, since doing so makes it a simple display time task to
743	    localize the token language for the user.

745	8. IANA Considerations

747	    [IMAP] capabilities are registered by publishing a standards track or
748	    IESG approved experimental RFC.  This document constitutes registration
749	    of the SORT and THREAD capabilities in the [IMAP] capabilities registry.

751	    This document creates a new [IMAP] threading algorithms registry, which
752	    registers threading algorithms by publishing a standards track or IESG
753	    approved experimental RFC.  This document constitutes registration of
754	    the ORDEREDSUBJECT and REFERENCES algorithms in that registry.

756	Appendices

758	A. Normative References

760	    The following documents are normative to this document:

762	    [ABNF]                Crocker, D. and Overell, P. "Augmented BNF
763	                          for Syntax Specifications: ABNF", RFC 2234,
764	                          November 1997.

766	    [CHARSET]             Freed, N. and J. Postel, "IANA Character Set
767	                          Registration Procedures", RFC 2978, October
768	                          2000.

770	    [COMPARATOR]          Newman, C. "Internet Appplication Protocol
771	                          Collation Registry", Work in Progress.

773	    [IMAP]                Crispin, M. "Internet Message Access Protocol -
774	                          Version 4rev1", RFC 3501, March 2003.

776	    [IMAP-I18N]           Newman, C. "Internet Message Access Protocol
777	                          Internationalization", Work in Progress.

779	    [KEYWORDS]            Bradner, S. "Key words for use in RFCs to
780	                          Indicate Requirement Levels", RFC 2119, Harvard
781	                          University, March 1997.

783	    [RFC-2822]            Resnick, P. "Internet Message Format", RFC 2822,
784	                          April 2001.

786	B. Informative References

788	    The following documents are informative to this document:

790	    [IMAP-MODELS]         Crispin, M. "Distributed Electronic Mail Models
791	                          in IMAP4", RFC 1733, December 1994.

793	    [THREADING]           Zawinski, J. "Message Threading",
794	                          http://www.jwz.org/doc/threading.html, 1997-2002.

796	Author's Address

798	    Mark R. Crispin
799	    Networks and Distributed Computing
800	    University of Washington
801	    4545 15th Avenue NE
802	    Seattle, WA  98105-4527

804	    Phone: (206) 543-5762

806	    EMail: MRC@CAC.Washington.EDU

808	    Kenneth Murchison
809	    Oceana Matrix Ltd.
810	    21 Princeton Place
811	    Orchard Park, NY 14127

813	    Phone: (716) 662-8973 x26

815	    EMail: ken@oceana.com

817	IPR Disclosure Acknowledgement

819	    By submitting this Internet-Draft, I certify that any applicable
820	    patent or other IPR claims of which I am aware have been disclosed,
821	    and any of which I become aware will be disclosed, in accordance with
822	    RFC 3668.

824	Intellectual Property Statement

826	    The IETF takes no position regarding the validity or scope of any
827	    intellectual property or other rights that might be claimed to
828	    pertain to the implementation or use of the technology described in
829	    this document or the extent to which any license under such rights
830	    might or might not be available; neither does it represent that it
831	    has made any effort to identify any such rights. Information on the
832	    IETF's procedures with respect to rights in standards-track and
833	    standards-related documentation can be found in BCP-11. Copies of
834	    claims of rights made available for publication and any assurances of
835	    licenses to be made available, or the result of an attempt made to
836	    obtain a general license or permission for the use of such
837	    proprietary rights by implementors or users of this specification can
838	    be obtained from the IETF Secretariat.

840	    The IETF invites any interested party to bring to its attention any
841	    copyrights, patents or patent applications, or other proprietary
842	    rights which may cover technology that may be required to practice
843	    this standard. Please address the information to the IETF Executive
844	    Director.

846	Full Copyright Statement

848	    Copyright (C) The Internet Society (2004).  This document is subject
849	    to the rights, licenses and restrictions contained in BCP 78 and
850	    except as set forth therein, the authors retain all their rights.

852	    This document and the information contained herein are provided on an
853	    "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE
854	    REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE
855	    INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR
856	    IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
857	    THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
858	    WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

860	Intellectual Property

862	    The IETF takes no position regarding the validity or scope of any
863	    Intellectual Property Rights or other rights that might be claimed
864	    to pertain to the implementation or use of the technology
865	    described in this document or the extent to which any license
866	    under such rights might or might not be available; nor does it
867	    represent that it has made any independent effort to identify any
868	    such rights.  Information on the procedures with respect to
869	    rights in RFC documents can be found in BCP 78 and BCP 79.

871	    Copies of IPR disclosures made to the IETF Secretariat and any
872	    assurances of licenses to be made available, or the result of an
873	    attempt made to obtain a general license or permission for the use
874	    of such proprietary rights by implementers or users of this
875	    specification can be obtained from the IETF on-line IPR repository
876	    at http://www.ietf.org/ipr.

878	    The IETF invites any interested party to bring to its attention
879	    any copyrights, patents or patent applications, or other
880	    proprietary rights that may cover technology that may be required
881	    to implement this standard.  Please address the information to the
882	    IETF at ietf-ipr@ietf.org.

884	Acknowledgement

886	    Funding for the RFC Editor function is currently provided by the
887	    Internet Society.