idnits 2.17.1 

draft-ietf-imapext-sort-20.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 15.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 856.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 867.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 874.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 880.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == The page length should not exceed 58 lines per page, but there was 1
     longer page, the longest (page 1) being 886 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The abstract seems to contain references ([IMAP]), which it shouldn't. 
     Please replace those with straight textual mentions of the documents in
     question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (March 10, 2008) is 5884 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC 3501 (ref. 'IMAP') (Obsoleted by RFC
     9051)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'IMAP-I18N'

  ** Obsolete normative reference: RFC 2822 (Obsoleted by RFC 5322)


     Summary: 5 errors (**), 0 flaws (~~), 2 warnings (==), 8 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	IMAP Extensions Working Group                                 M. Crispin
2	Internet-Draft                                              K. Murchison
3	Intended status: Proposed Standard                        March 10, 2008
4	Expires: September 10, 2008
5	Document: internet-drafts/draft-ietf-imapext-sort-20.txt

7	     INTERNET MESSAGE ACCESS PROTOCOL - SORT AND THREAD EXTENSIONS

9	Status of this Memo

11	   By submitting this Internet-Draft, each author represents that
12	   any applicable patent or other IPR claims of which he or she is
13	   aware have been or will be disclosed, and any of which he or she
14	   becomes aware will be disclosed, in accordance with Section 6 of
15	   BCP 79.

17	   Internet-Drafts are working documents of the Internet Engineering
18	   Task Force (IETF), its areas, and its working groups.  Note that
19	   other groups may also distribute working documents as
20	   Internet-Drafts.

22	   Internet-Drafts are draft documents valid for a maximum of six months
23	   and may be updated, replaced, or obsoleted by other documents at any
24	   time.  It is inappropriate to use Internet-Drafts as reference
25	   material or to cite them other than as "work in progress."

27	   The list of current Internet-Drafts can be accessed at
28	   http://www.ietf.org/ietf/1id-abstracts.txt

30	   The list of Internet-Draft Shadow Directories can be accessed at
31	   http://www.ietf.org/shadow.html.

33	   A revised version of this draft document will be submitted to the RFC
34	   editor as a Proposed Standard for the Internet Community.  Discussion
35	   and suggestions for improvement are requested, and should be sent to
36	   ietf-imapext@IMC.ORG.

38	   Distribution of this memo is unlimited.

40	Abstract

42	   This document describes the base-level server-based sorting and
43	   threading extensions to the [IMAP] protocol.  These extensions
44	   provide substantial performance improvements for IMAP clients which
45	   offer sorted and threaded views.

47	1. Introduction

49	   The SORT and THREAD extensions to the [IMAP] protocol provide a means
50	   of server-based sorting and threading of messages, without requiring
51	   that the client download the necessary data to do so itself.  This is
52	   particularly useful for online clients as described in [IMAP-MODELS].

54	   A server which supports the base-level SORT extension indicates this
55	   with a capability name which starts with "SORT".  Future,
56	   upwards-compatible extensions to the SORT extension will all start
57	   with "SORT", indicating support for this base level.

59	   A server which supports the THREAD extension indicates this with one
60	   or more capability names consisting of "THREAD=" followed by a
61	   supported threading algorithm name as described in this document.
62	   This provides for future upwards-compatible extensions.

64	   A server which implements the SORT and/or THREAD extensions MUST
65	   collate strings in accordance with the requirements of I18NLEVEL=1,
66	   as described in [IMAP-I18N], and SHOULD implement and advertise the
67	   I18NLEVEL=1 extension.  Alternatively, a server MAY implement
68	   I18NLEVEL=2 (or higher) and comply with the rules of that level.

70	      Discussion: the SORT and THREAD extensions predate [IMAP-I18N] by
71	      several years.  At the time of this writing, all known server
72	      implementations of SORT and THREAD comply with the rules of
73	      I18NLEVEL=1, but do not necessarily advertise it.  As discussed
74	      in [IMAP-I18N] section 4.5, all server implementations should
75	      eventually be updated to comply with the I18NLEVEL=2 extension.

77	   Historical note: the REFERENCES threading algorithm is based on the
78	   [THREADING] algorithm written used in "Netscape Mail and News"
79	   versions 2.0 through 3.0.

81	2. Terminology

83	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
84	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
85	   document are to be interpreted as described in [KEYWORDS].

87	   The word "can" (not "may") is used to refer to a possible
88	   circumstance or situation, as opposed to an optional facility of the
89	   protocol.

91	   "User" is used to refer to a human user, whereas "client" refers to
92	   the software being run by the user.

94	   In examples, "C:" and "S:" indicate lines sent by the client and
95	   server respectively.

97	2.1 Base Subject

99	   Subject sorting and threading use the "base subject," which has
100	   specific subject artifacts removed.  Due to the complexity of these
101	   artifacts, the formal syntax for the subject extraction rules is
102	   ambiguous.  The following procedure is followed to determine the
103	   "base subject", using the [ABNF] formal syntax rules described in
104	   section 5:

106	        (1) Convert any RFC 2047 encoded-words in the subject to
107	        UTF-8 as described in "internationalization
108	        considerations."  Convert all tabs and continuations to
109	        space.  Convert all multiple spaces to a single space.

111	        (2) Remove all trailing text of the subject that matches
112	        the subj-trailer ABNF, repeat until no more matches are
113	        possible.

115	        (3) Remove all prefix text of the subject that matches the
116	        subj-leader ABNF.

118	        (4) If there is prefix text of the subject that matches the
119	        subj-blob ABNF, and removing that prefix leaves a non-empty
120	        subj-base, then remove the prefix text.

122	        (5) Repeat (3) and (4) until no matches remain.

124	   Note: it is possible to defer step (2) until step (6), but this
125	   requires checking for subj-trailer in step (4).

127	        (6) If the resulting text begins with the subj-fwd-hdr ABNF
128	        and ends with the subj-fwd-trl ABNF, remove the
129	        subj-fwd-hdr and subj-fwd-trl and repeat from step (2).

131	        (7) The resulting text is the "base subject" used in the
132	        SORT.

134	   All servers and disconnected (as described in [IMAP-MODELS]) clients
135	   MUST use exactly this algorithm to determine the "base subject".
136	   Otherwise there is potential for a user to get inconsistent results
137	   based on whether they are running in connected or disconnected mode.

139	2.2 Sent Date

141	   As used in this document, the term "sent date" refers to the date and
142	   time from the Date: header, adjusted by time zone to normalize to
143	   UTC.  For example, "31 Dec 2000 16:01:33 -0800" is equivalent to the
144	   UTC date and time of "1 Jan 2001 00:01:33 +0000".

146	   If the time zone is invalid, the date and time SHOULD be treated as
147	   UTC.  If the time is also invalid, the time SHOULD be treated as
148	   00:00:00.  If there is no valid date or time, the date and time
149	   SHOULD be treated as 00:00:00 on the earliest possible date.

151	   This differs from the date-related criteria in the SEARCH command
152	   (described in [IMAP] section 6.4.4), which use just the date and not
153	   the time, and are not adjusted by time zone.

155	   If the sent date can not be determined (a Date: header is missing or
156	   can not be parsed), the INTERNALDATE for that message is used as the
157	   sent date.

159	   When comparing two sent dates that match exactly, the order in which
160	   the two messages appear in the mailbox (that is, by sequence number)
161	   is used as a tie-breaker to determine the order.

163	3. Additional Commands

165	   These commands are extension to the [IMAP] base protocol.

167	   The section headings are intended to correspond with where they would
168	   be located in the main document if they were part of the base
169	   specification.

171	BASE.6.4.SORT. SORT Command

173	   Arguments:  sort program
174	               charset specification
175	               searching criteria (one or more)

177	   Data:       untagged responses: SORT

179	   Result:     OK - sort completed
180	               NO - sort error: can't sort that charset or
181	                    criteria
182	               BAD - command unknown or arguments invalid

184	      The SORT command is a variant of SEARCH with sorting semantics for
185	      the results.  Sort has two arguments before the searching criteria
186	      argument; a parenthesized list of sort criteria, and the searching
187	      charset.

189	      The charset argument is mandatory (unlike SEARCH) and indicates
190	      the [CHARSET] of the strings that appear in the searching
191	      criteria.  The US-ASCII and UTF-8 charsets MUST be implemented.
192	      All other charsets are optional.

194	      There is also a UID SORT command which returns unique identifiers
195	      instead of message sequence numbers.  Note that there are separate
196	      searching criteria for message sequence numbers and UIDs; thus the
197	      arguments to UID SORT are interpreted the same as in SORT.  This
198	      is analogous to the behavior of UID SEARCH, as opposed to UID
199	      COPY, UID FETCH, or UID STORE.

201	      The SORT command first searches the mailbox for messages that
202	      match the given searching criteria using the charset argument for
203	      the interpretation of strings in the searching criteria.  It then
204	      returns the matching messages in an untagged SORT response, sorted
205	      according to one or more sort criteria.

207	      Sorting is in ascending order.  Earlier dates sort before later
208	      dates; smaller sizes sort before larger sizes; and strings are
209	      sorted according to ascending values established by their
210	      collation algorithm (see under "Internationalization
211	      Considerations").

213	      If two or more messages exactly match according to the sorting
214	      criteria, these messages are sorted according to the order in
215	      which they appear in the mailbox.  In other words, there is an
216	      implicit sort criterion of "sequence number".

218	      When multiple sort criteria are specified, the result is sorted in
219	      the priority order that the criteria appear.  For example,
220	      (SUBJECT DATE) will sort messages in order by their base subject
221	      text; and for messages with the same base subject text will sort
222	      by their sent date.

224	      Untagged EXPUNGE responses are not permitted while the server is
225	      responding to a SORT command, but are permitted during a UID SORT
226	      command.

228	      The defined sort criteria are as follows.  Refer to the Formal
229	      Syntax section for the precise syntactic definitions of the
230	      arguments.  If the associated RFC-822 header for a particular
231	      criterion is absent, it is treated as the empty string.  The empty
232	      string always collates before non-empty strings.

234	      ARRIVAL
235	         Internal date and time of the message.  This differs from the
236	         ON criteria in SEARCH, which uses just the internal date.

238	      CC
239	         [IMAP] addr-mailbox of the first "cc" address.

241	      DATE
242	         Sent date and time, as described in section 2.2.

244	      FROM
245	         [IMAP] addr-mailbox of the first "From" address.

247	      REVERSE
248	         Followed by another sort criterion, has the effect of that
249	         criterion but in reverse (descending) order.
250	            Note: REVERSE only reverses a single criterion, and does not
251	            affect the implicit "sequence number" sort criterion if all
252	            other criteria are identicial.  Consequently, a sort of
253	            REVERSE SUBJECT is not the same as a reverse ordering of a
254	            SUBJECT sort.  This can be avoided by use of additional
255	            criteria, e.g. SUBJECT DATE vs. REVERSE SUBJECT REVERSE
256	            DATE.  In general, however, it's better (and faster, if the
257	            client has a "reverse current ordering" command) to reverse
258	            the results in the client instead of issuing a new SORT.

260	      SIZE
261	         Size of the message in octets.

263	      SUBJECT
264	         Base subject text.

266	      TO
267	         [IMAP] addr-mailbox of the first "To" address.

269	   Example:    C: A282 SORT (SUBJECT) UTF-8 SINCE 1-Feb-1994
270	               S: * SORT 2 84 882
271	               S: A282 OK SORT completed
272	               C: A283 SORT (SUBJECT REVERSE DATE) UTF-8 ALL
273	               S: * SORT 5 3 4 1 2
274	               S: A283 OK SORT completed
275	               C: A284 SORT (SUBJECT) US-ASCII TEXT "not in mailbox"
276	               S: * SORT
277	               S: A284 OK SORT completed

279	BASE.6.4.THREAD. THREAD Command

281	Arguments:  threading algorithm
282	            charset specification
283	            searching criteria (one or more)

285	Data:       untagged responses: THREAD

287	Result:     OK - thread completed
288	            NO - thread error: can't thread that charset or
289	                 criteria
290	            BAD - command unknown or arguments invalid

292	      The THREAD command is a variant of SEARCH with threading semantics
293	      for the results.  Thread has two arguments before the searching
294	      criteria argument; a threading algorithm, and the searching
295	      charset.

297	      The charset argument is mandatory (unlike SEARCH) and indicates
298	      the [CHARSET] of the strings that appear in the searching
299	      criteria.  The US-ASCII and UTF-8 charsets MUST be implemented.
300	      All other charsets are optional.

302	      There is also a UID THREAD command which returns unique
303	      identifiers instead of message sequence numbers.  Note that there
304	      are separate searching criteria for message sequence numbers and
305	      UIDs; thus the arguments to UID THREAD are interpreted the same as
306	      in THREAD.  This is analogous to the behavior of UID SEARCH, as
307	      opposed to UID COPY, UID FETCH, or UID STORE.

309	      The THREAD command first searches the mailbox for messages that
310	      match the given searching criteria using the charset argument for
311	      the interpretation of strings in the searching criteria.  It then
312	      returns the matching messages in an untagged THREAD response,
313	      threaded according to the specified threading algorithm.

315	      All collation is in ascending order.  Earlier dates collate before
316	      later dates and strings are collated according to ascending values
317	      established by their collation algorithm (see under
318	      "Internationalization Considerations").

320	      Untagged EXPUNGE responses are not permitted while the server is
321	      responding to a THREAD command, but are permitted during a UID
322	      THREAD command.

324	      The defined threading algorithms are as follows:

326	      ORDEREDSUBJECT

328	         The ORDEREDSUBJECT threading algorithm is also referred to as
329	         "poor man's threading."  The searched messages are sorted by
330	         base subject and then by the sent date.  The messages are then
331	         split into separate threads, with each thread containing
332	         messages with the same base subject text.  Finally, the threads
333	         are sorted by the sent date of the first message in the thread.

335	         The first message of each thread are siblings of each other
336	         (the "root").  The second message of a thread is the child of
337	         the first message, and subsequent messages of the thread are
338	         siblings of the second message and hence children of the
339	         message at the root.  Hence, there are no grandchildren in
340	         ORDEREDSUBJECT threading.

342	         Children in ORDEREDSUBJECT threading do not have descendents.
343	         Client implementations SHOULD treat descendents of a child in
344	         a server response as being siblings of that child.

346	      REFERENCES

348	         The REFERENCES threading algorithm threads the searched
349	         messages by grouping them together in parent/child
350	         relationships based on which messages are replies to others.
351	         The parent/child relationships are built using two methods:
352	         reconstructing a message's ancestry using the references
353	         contained within it; and checking the original (not base)
354	         subject of a message to see if it is a reply to (or forward of)
355	         another message.

357	            Note: "Message ID" in the following description refers to a
358	            normalized form of the msg-id in [RFC-2822].  The actual
359	            text in an RFC 2822 may use quoting, resulting in multiple
360	            ways of expressing the same Message ID.  Implementations of
361	            the REFERENCES threading algorithm MUST normalize any msg-id
362	            in order to avoid false non-matches due to differences in
363	            quoting.

365	            For example, the msg-id
366	               <"01KF8JCEOCBS0045PS"@xxx.yyy.com>
367	            and the msg-id
368	               <01KF8JCEOCBS0045PS@xxx.yyy.com>
369	            MUST be interpreted as being the same Message ID.

371	         The references used for reconstructing a message's ancestry are
372	         found using the following rules:

374	            If a message contains a References header line, then use the
375	            Message IDs in the References header line as the references.

377	            If a message does not contain a References header line, or
378	            the References header line does not contain any valid
379	            Message IDs, then use the first (if any) valid Message ID
380	            found in the In-Reply-To header line as the only reference
381	            (parent) for this message.

383	               Note: Although [RFC-2822] permits multiple Message IDs in
384	               the In-Reply-To header, in actual practice this
385	               discipline has not been followed.  For example,
386	               In-Reply-To headers have been observed with message
387	               addresses after the Message ID, and there are no good
388	               heuristics for software to determine the difference.
389	               This is not a problem with the References header however.

391	            If a message does not contain an In-Reply-To header line, or
392	            the In-Reply-To header line does not contain a valid Message
393	            ID, then the message does not have any references (NIL).

395	         A message is considered to be a reply or forward if the base
396	         subject extraction rules, applied to the original subject,
397	         remove any of the following: a subj-refwd, a "(fwd)"
398	         subj-trailer, or a subj-fwd-hdr and subj-fwd-trl.

400	         The REFERENCES algorithm is significantly more complex than
401	         ORDEREDSUBJECT and consists of six main steps.  These steps are
402	         outlined in detail below.

404	         (1) For each searched message:

406	            (A) Using the Message IDs in the message's references, link
407	            the corresponding messages (those whose Message-ID header
408	            line contains the given reference Message ID) together as
409	            parent/child.  Make the first reference the parent of the
410	            second (and the second a child of the first), the second the
411	            parent of the third (and the third a child of the second),
412	            etc.  The following rules govern the creation of these
413	            links:

415	               If a message does not contain a Message-ID header line,
416	               or the Message-ID header line does not contain a valid
417	               Message ID, then assign a unique Message ID to this
418	               message.

420	               If two or more messages have the same Message ID, then
421	               only use that Message ID in the first (lowest sequence
422	               number) message, and assign a unique Message ID to each
423	               of the subsequent messages with a duplicate of that
424	               Message ID.

426	               If no message can be found with a given Message ID,
427	               create a dummy message with this ID.  Use this dummy
428	               message for all subsequent references to this ID.

430	               If a message already has a parent, don't change the
431	               existing link.  This is done because the References
432	               header line may have been truncated by a MUA.  As a
433	               result, there is no guarantee that the messages
434	               corresponding to adjacent Message IDs in the References
435	               header line are parent and child.

437	               Do not create a parent/child link if creating that link
438	               would introduce a loop.  For example, before making
439	               message A the parent of B, make sure that A is not a
440	               descendent of B.

442	                  Note: Message ID comparisons are case-sensitive.

444	            (B) Create a parent/child link between the last reference
445	            (or NIL if there are no references) and the current message.
446	            If the current message already has a parent, it is probably
447	            the result of a truncated References header line, so break
448	            the current parent/child link before creating the new
449	            correct one.  As in step 1.A, do not create the parent/child
450	            link if creating that link would introduce a loop.  Note
451	            that if this message has no references, that it will now
452	            have no parent.

454	               Note: The parent/child links created in steps 1.A and 1.B
455	               MUST be kept consistent with one another at ALL times.

457	         (2) Gather together all of the messages that have no parents
458	         and make them all children (siblings of one another) of a dummy
459	         parent (the "root").  These messages constitute the first
460	         (head) message of the threads created thus far.

462	         (3) Prune dummy messages from the thread tree.  Traverse each
463	         thread under the root, and for each message:

465	            If it is a dummy message with NO children, delete it.

467	            If it is a dummy message with children, delete it, but
468	            promote its children to the current level.  In other words,
469	            splice them in with the dummy's siblings.

471	            Do not promote the children if doing so would make them
472	            children of the root, unless there is only one child.

474	         (4) Sort the messages under the root (top-level siblings only)
475	         by sent date as described in section 2.2.  In the case of a
476	         dummy message, sort its children by sent date and then use the
477	         first child for the top-level sort.

479	         (5) Gather together messages under the root that have the same
480	         base subject text.

482	            (A) Create a table for associating base subjects with
483	            messages, called the subject table.

485	            (B) Populate the subject table with one message per each
486	            base subject.  For each child of the root:

488	               (i) Find the subject of this thread, by using the base
489	               subject from either the current message or its first
490	               child if the current message is a dummy.  This is the
491	               thread subject.

493	               (ii) If the thread subject is empty, skip this message.

495	               (iii) Look up the message associated with the thread
496	               subject in the subject table.

498	               (iv) If there is no message in the subject table with the
499	               thread subject, add the current message and the thread
500	               subject to the subject table.

502	               Otherwise, if the message in the subject table is not a
503	               dummy, AND either of the following criteria are true:

505	                  The current message is a dummy, OR

507	                  The message in the subject table is a reply or forward
508	                  and the current message is not.

510	            then replace the message in the subject table with the
511	            current message.

513	            (C) Merge threads with the same thread subject.  For each
514	            child of the root:

516	               (i) Find the message's thread subject as in step 5.B.i
517	               above.

519	               (ii) If the thread subject is empty, skip this message.

521	               (iii) Lookup the message associated with this thread
522	               subject in the subject table.

524	               (iv) If the message in the subject table is the current
525	               message, skip this message.

527	               Otherwise, merge the current message with the one in the
528	               subject table using the following rules:

530	                  If both messages are dummies, append the current
531	                  message's children to the children of the message in
532	                  the subject table (the children of both messages
533	                  become siblings), and then delete the current message.

535	                  If the message in the subject table is a dummy and the
536	                  current message is not, make the current message a
537	                  child of the message in the subject table (a sibling
538	                  of its children).

540	                  If the current message is a reply or forward and the
541	                  message in the subject table is not, make the current
542	                  message a child of the message in the subject table (a
543	                  sibling of its children).

545	                  Otherwise, create a new dummy message and make both
546	                  the current message and the message in the subject
547	                  table children of the dummy.  Then replace the message
548	                  in the subject table with the dummy message.

550	                     Note: Subject comparisons are case-insensitive, as
551	                     described under "Internationalization
552	                     Considerations."

554	         (6) Traverse the messages under the root and sort each set of
555	         siblings by sent date as described in section 2.2.  Traverse
556	         the messages in such a way that the "youngest" set of siblings
557	         are sorted first, and the "oldest" set of siblings are sorted
558	         last (grandchildren are sorted before children, etc).  In the
559	         case of a dummy message (which can only occur with top-level
560	         siblings), use its first child for sorting.

562	   Example:    C: A283 THREAD ORDEREDSUBJECT UTF-8 SINCE 5-MAR-2000
563	               S: * THREAD (166)(167)(168)(169)(172)(170)(171)
564	                  (173)(174 (175)(176)(178)(181)(180))(179)(177
565	                  (183)(182)(188)(184)(185)(186)(187)(189))(190)
566	                  (191)(192)(193)(194 195)(196 (197)(198))(199)
567	                  (200 202)(201)(203)(204)(205)(206 207)(208)
568	               S: A283 OK THREAD completed
569	               C: A284 THREAD ORDEREDSUBJECT US-ASCII TEXT "gewp"
570	               S: * THREAD
571	               S: A284 OK THREAD completed
572	               C: A285 THREAD REFERENCES UTF-8 SINCE 5-MAR-2000
573	               S: * THREAD (166)(167)(168)(169)(172)((170)(179))
574	                  (171)(173)((174)(175)(176)(178)(181)(180))
575	                  ((177)(183)(182)(188 (184)(189))(185 186)(187))
576	                  (190)(191)(192)(193)((194)(195 196))(197 198)
577	                  (199)(200 202)(201)(203)(204)(205 206 207)(208)
578	               S: A285 OK THREAD completed

580	        Note: The line breaks in the first and third server
581	        responses are for editorial clarity and do not appear in
582	        real THREAD responses.

584	4. Additional Responses

586	   These responses are extensions to the [IMAP] base protocol.

588	   The section headings of these responses are intended to correspond
589	   with where they would be located in the main document.

591	BASE.7.2.SORT. SORT Response

593	   Data:       zero or more numbers

595	      The SORT response occurs as a result of a SORT or UID SORT
596	      command.  The number(s) refer to those messages that match the
597	      search criteria.  For SORT, these are message sequence numbers;
598	      for UID SORT, these are unique identifiers.  Each number is
599	      delimited by a space.

601	   Example:    S: * SORT 2 3 6

603	BASE.7.2.THREAD. THREAD Response

605	   Data:       zero or more threads

607	      The THREAD response occurs as a result of a THREAD or UID THREAD
608	      command.  It contains zero or more threads.  A thread consists of
609	      a parenthesized list of thread members.

611	      Thread members consist of zero or more message numbers, delimited
612	      by spaces, indicating successive parent and child.  This continues
613	      until the thread splits into multiple sub-threads, at which point
614	      the thread nests into multiple sub-threads with the first member
615	      of each subthread being siblings at this level.  There is no limit
616	      to the nesting of threads.

618	      The messages numbers refer to those messages that match the search
619	      criteria.  For THREAD, these are message sequence numbers; for UID
620	      THREAD, these are unique identifiers.

622	   Example:    S: * THREAD (2)(3 6 (4 23)(44 7 96))

624	      The first thread consists only of message 2.  The second thread
625	      consists of the messages 3 (parent) and 6 (child), after which it
626	      splits into two subthreads; the first of which contains messages 4
627	      (child of 6, sibling of 44) and 23 (child of 4), and the second of
628	      which contains messages 44 (child of 6, sibling of 4), 7 (child of
629	      44), and 96 (child of 7).  Since some later messages are parents
630	      of earlier messages, the messages were probably moved from some
631	      other mailbox at different times.

633	      -- 2

635	      -- 3
636	         \-- 6
637	             |-- 4
638	             |   \-- 23
639	             |
640	             \-- 44
641	                  \-- 7
642	                      \-- 96

644	   Example:    S: * THREAD ((3)(5))

646	      In this example, 3 and 5 are siblings of a parent which does not
647	      match the search criteria (and/or does not exist in the mailbox);
648	      however they are members of the same thread.

650	5. Formal Syntax of SORT and THREAD Commands and Responses

652	   The following syntax specification uses the Augmented Backus-Naur
653	   Form (ABNF) notation as specified in [ABNF].  It also uses [ABNF]
654	   rules defined in [IMAP].

656	sort            = ["UID" SP] "SORT" SP sort-criteria SP search-criteria

658	sort-criteria   = "(" sort-criterion *(SP sort-criterion) ")"

660	sort-criterion  = ["REVERSE" SP] sort-key

662	sort-key        = "ARRIVAL" / "CC" / "DATE" / "FROM" / "SIZE" /
663	                  "SUBJECT" / "TO"

665	thread          = ["UID" SP] "THREAD" SP thread-alg SP search-criteria

667	thread-alg      = "ORDEREDSUBJECT" / "REFERENCES" / thread-alg-ext

669	thread-alg-ext  = atom
670	                    ; New algorithms MUST be registered with IANA

672	search-criteria = charset 1*(SP search-key)

674	charset         = atom / quoted
675	                    ; CHARSET values MUST be registered with IANA

677	sort-data       = "SORT" *(SP nz-number)

679	thread-data     = "THREAD" [SP 1*thread-list]

681	thread-list     = "(" (thread-members / thread-nested) ")"

683	thread-members  = nz-number *(SP nz-number) [SP thread-nested]

685	thread-nested   = 2*thread-list

687	   The following syntax describes base subject extraction rules (2)-(6):

689	subject         = *subj-leader [subj-middle] *subj-trailer

691	subj-refwd      = ("re" / ("fw" ["d"])) *WSP [subj-blob] ":"

693	subj-blob       = "[" *BLOBCHAR "]" *WSP

695	subj-fwd        = subj-fwd-hdr subject subj-fwd-trl

697	subj-fwd-hdr    = "[fwd:"

699	subj-fwd-trl    = "]"

701	subj-leader     = (*subj-blob subj-refwd) / WSP

703	subj-middle     = *subj-blob (subj-base / subj-fwd)
704	                    ; last subj-blob is subj-base if subj-base would
705	                    ; otherwise be empty

707	subj-trailer    = "(fwd)" / WSP

709	subj-base       = NONWSP *(*WSP NONWSP)
710	                    ; can be a subj-blob

712	BLOBCHAR        = %x01-5a / %x5c / %x5e-ff
713	                    ; any CHAR8 except '[' and ']'

715	NONWSP          = %x01-08 / %x0a-1f / %x21-ff
716	                    ; any CHAR8 other than WSP

718	6. Security Considerations

720	   The SORT and THREAD extensions do not raise any security
721	   considerations that are not present in the base [IMAP] protocol, and
722	   these issues are discussed in [IMAP].  Nevertheless, it is important
723	   to remember that [IMAP] protocol transactions, including message
724	   data, are sent in the clear over the network unless protection from
725	   snooping is negotiated, either by the use of STARTTLS, privacy
726	   protection is negotiated in the AUTHENTICATE command, or some other
727	   protection mechanism.

729	   Although not a security consideration, it is important to recognize
730	   that sorting by REFERENCES can lead to misleading threading trees.
731	   For example, a message with false References: header data will cause
732	   a thread to be incorporated into another thread.

734	   The process of extracting the base subject may lead to incorrect
735	   collation if the extracted data was significant text as opposed to
736	   a subject artifact.

738	7. Internationalization Considerations

740	   As stated in the introduction, the rules of I18NLEVEL=1 as described
741	   in [IMAP-I18N] MUST be followed; that is, the SORT and THREAD
742	   extensions MUST collate strings according to the i;unicode-casemap
743	   collation described in [UNICASEMAP].  Servers SHOULD also advertise
744	   the I18NLEVEL=1 extension.  Alternatively, a server MAY implement
745	   I18NLEVEL=2 (or higher) and comply with the rules of that level.

747	   As discussed in [IMAP-I18N] section 4.5, all server implementations
748	   should eventually be updated to support the [IMAP-I18N] I18NLEVEL=2
749	   extension.

751	   Translations of the "re" or "fw"/"fwd" tokens are not specified for
752	   removal in the base subject extraction process.  An attempt to add
753	   such translated tokens would result in a geometrically complex, and
754	   ultimately unimplementable, task.

756	   Instead, note that [RFC-2822] section 3.6.5 recommends that "re:"
757	   (from the Latin "res", in the matter of) be used to identify a reply.
758	   Although it is evident that, from the multiple forms of token to
759	   identify a forwarded message, there is considerable variation found
760	   in the wild, the variations are (still) manageable.  Consequently, it
761	   is suggested that "re:" and one of the variations of the tokens for
762	   forward supported by the base subject extraction rules be adopted for
763	   Internet mail messages, since doing so makes it a simple display time
764	   task to localize the token language for the user.

766	8. IANA Considerations

768	   [IMAP] capabilities are registered by publishing a standards track or
769	   IESG approved experimental RFC.  This document constitutes
770	   registration of the SORT and THREAD capabilities in the [IMAP]
771	   capabilities registry.

773	   This document creates a new [IMAP] threading algorithms registry,
774	   which registers threading algorithms by publishing a standards track
775	   or IESG approved experimental RFC.  This document constitutes
776	   registration of the ORDEREDSUBJECT and REFERENCES algorithms in that
777	   registry.

779	9. Normative References

781	   The following documents are normative to this document:

783	   [ABNF]                Crocker, D. and Overell, P. "Augmented BNF
784	                         for Syntax Specifications: ABNF", RFC 5234
785	                         January 2008

787	   [CHARSET]             Freed, N. and Postel, J. "IANA Character Set
788	                         Registration Procedures", RFC 2978, October
789	                         2000.

791	   [IMAP]                Crispin, M. "Internet Message Access Protocol -
792	                         Version 4rev1", RFC 3501, March 2003.

794	   [IMAP-I18N]           Newman, C. and Gulbrandsen, A. "Internet
795	                         Message Access Protocol Internationalization",
796	                         Work in Progress.

798	   [KEYWORDS]            Bradner, S. "Key words for use in RFCs to
799	                         Indicate Requirement Levels", BCP 14, RFC 2119,
800	                         March 1997.

802	   [RFC-2822]            Resnick, P. "Internet Message Format", RFC
803	                         2822, April 2001.

805	   [UNICASEMAP]          Crispin, M. "i;unicode-casemap - Simple Unicode
806	                         Collation Algorithm", RFC 5051.

808	10. Informative References

810	   The following documents are informative to this document:

812	   [IMAP-MODELS]         Crispin, M. "Distributed Electronic Mail Models
813	                         in IMAP4", RFC 1733, December 1994.

815	   [THREADING]           Zawinski, J. "Message Threading",
816	                         http://www.jwz.org/doc/threading.html,
817	                         1997-2002.

819	Appendices

821	Author's Address

823	   Mark R. Crispin
824	   Networks and Distributed Computing
825	   University of Washington
826	   4545 15th Avenue NE
827	   Seattle, WA  98105-4527

829	   Phone: +1 (206) 543-5762

831	   EMail: MRC@CAC.Washington.EDU

833	   Kenneth Murchison
834	   Carnegie Mellon University
835	   5000 Forbes Avenue
836	   Cyert Hall 285
837	   Pittsburgh, PA  15213

839	   Phone: +1 (412) 268-2638
840	   Email: murch@andrew.cmu.edu

842	Full Copyright Statement

844	   Copyright (C) The IETF Trust (2008).

846	   This document is subject to the rights, licenses and restrictions
847	   contained in BCP 78, and except as set forth therein, the authors
848	   retain all their rights.

850	   This document and the information contained herein are provided on an
851	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
852	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
853	   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
854	   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
855	   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
856	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

858	Intellectual Property

860	   The IETF takes no position regarding the validity or scope of any
861	   Intellectual Property Rights or other rights that might be claimed to
862	   pertain to the implementation or use of the technology described in
863	   this document or the extent to which any license under such rights
864	   might or might not be available; nor does it represent that it has
865	   made any independent effort to identify any such rights.  Information
866	   on the procedures with respect to rights in RFC documents can be
867	   found in BCP 78 and BCP 79.

869	   Copies of IPR disclosures made to the IETF Secretariat and any
870	   assurances of licenses to be made available, or the result of an
871	   attempt made to obtain a general license or permission for the use of
872	   such proprietary rights by implementers or users of this
873	   specification can be obtained from the IETF on-line IPR repository at
874	   http://www.ietf.org/ipr.

876	   The IETF invites any interested party to bring to its attention any
877	   copyrights, patents or patent applications, or other proprietary
878	   rights that may cover technology that may be required to implement
879	   this standard.  Please address the information to the IETF at ietf-
880	   ipr@ietf.org.

882	Acknowledgement

884	   Funding for the RFC Editor function is currently provided by the
885	   Internet Society.