idnits 2.17.1 

draft-ietf-imapext-thread-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 13 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an Introduction section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The abstract seems to contain references ([ABNF], [NEWS]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 83: '...onnected clients MUST use exactly this...'
     RFC 2119 keyword, line 227: '...               MUST be kept consistent...'
     RFC 2119 keyword, line 455: '...plementations of THREAD MUST implement...'


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (September 2000) is 8624 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Missing reference section? 'ABNF' on line 500 looks like a reference

  -- Missing reference section? 'NEWS' on line 503 looks like a reference


     Summary: 7 errors (**), 0 flaws (~~), 2 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	IMAP Extensions Working Group                                 M. Crispin
3	Internet Draft: IMAP THREAD                                 K. Murchison
4	                                                          September 2000
5	Document: internet-drafts/draft-ietf-imapext-thread-03.txt

7	          INTERNET MESSAGE ACCESS PROTOCOL - THREAD EXTENSION

9	Status of this Memo

11	   This document is an Internet-Draft and is in full conformance with
12	   all provisions of Section 10 of RFC 2026.

14	   Internet-Drafts are working documents of the Internet Engineering
15	   Task Force (IETF), its areas, and its working groups.  Note that
16	   other groups may also distribute working documents as Internet-
17	   Drafts.

19	   Internet-Drafts are draft documents valid for a maximum of six months
20	   and may be updated, replaced, or obsoleted by other documents at any
21	   time.  It is inappropriate to use Internet-Drafts as reference
22	   material or to cite them other than as "work in progress."

24	   The list of current Internet-Drafts can be accessed at
25	   http://www.ietf.org/ietf/1id-abstracts.txt

27	   To view the list Internet-Draft Shadow Directories, see
28	   http://www.ietf.org/shadow.html.

30	   A revised version of this draft document will be submitted to the RFC
31	   editor as a Proposed Standard for the Internet Community.

33	   Discussion and suggestions for improvement are requested, and should
34	   be sent to ietf-imapext@IMC.ORG.  This document will expire before 20
35	   March 2001.  Distribution of this memo is unlimited.

37	Abstract

39	   This document describes the server-based threading extension to the
40	   IMAP4rev1 protocol.  This extension provides substantial performance
41	   improvements for IMAP clients which offer threaded views.

43	   A server which supports this extension indicates this with one or
44	   more capability names consisting of "THREAD=" followed by a supported
45	   threading algorithm name as described in this document.  This
46	   provides for future upwards-compatible extensions.

48	Extracted Subject Text

50	   Threading uses a version of the subject which has specific subject
51	   artifacts of deployed Internet mail software removed.  Due to the
52	   complexity of these artifacts, the formal syntax for the subject
53	   extraction rules is ambiguous.  The following procedure is followed
54	   to determine the actual "base subject" which is used to thread:

56	        (1) Convert any RFC 2047 encoded-words in the subject to
57	        UTF-8.  Convert all tabs and continuations to space.
58	        Convert all multiple spaces to a single space.

60	        (2) Remove all trailing text of the subject that matches
61	        the subj-trailer ABNF, repeat until no more matches are
62	        possible.

64	        (3) Remove all prefix text of the subject that matches the
65	        subj-leader ABNF.

67	        (4) If there is prefix text of the subject that matches the
68	        subj-blob ABNF, and removing that prefix leaves a non-empty
69	        subj-base, then remove the prefix text.

71	        (5) Repeat (3) and (4) until no matches remain.

73	   Note: it is possible to defer step (2) until step (6), but this
74	   requires checking for subj-trailer in step (4).

76	        (6) If the resulting text begins with the subj-fwd-hdr ABNF
77	        and ends with the subj-fwd-trl ABNF, remove the
78	        subj-fwd-hdr and subj-fwd-trl and repeat from step (2).

80	        (7) The resulting text is the "base subject" used in
81	        threading.

83	   All servers and disconnected clients MUST use exactly this algorithm
84	   when threading.  Otherwise there is potential for a user to get
85	   inconsistent results based on whether they are running in connected
86	   or disconnected IMAP mode.

88	Additional Commands

90	   This command is an extension to the IMAP4rev1 base protocol.

92	   The section header is intended to correspond with where it would be
93	   located in the main document if it was part of the base
94	   specification.

96	6.3.THREAD.     THREAD Command

98	   Arguments:  threading algorithm
99	               charset specification
100	               searching criteria (one or more)

102	   Data:       untagged responses: THREAD

104	   Result:     OK - thread completed
105	               NO - thread error: can't thread that charset or
106	                    criteria
107	               BAD - command unknown or arguments invalid

109	      The THREAD command is a variant of SEARCH with threading semantics
110	      for the results.  Thread has two arguments before the searching
111	      criteria argument; a threading algorithm, and the searching
112	      charset.  Note that unlike SEARCH, the searching charset argument
113	      is mandatory.

115	      There is also a UID THREAD command which corresponds to THREAD the
116	      way that UID SEARCH corresponds to SEARCH.

118	      The THREAD command first searches the mailbox for messages that
119	      match the given searching criteria using the charset argument for
120	      the interpretation of strings in the searching criteria.  It then
121	      returns the matching messages in an untagged THREAD response,
122	      threaded according to the specified threading algorithm.

124	      The defined threading algorithms are as follows:

126	      ORDEREDSUBJECT
127	         The ORDEREDSUBJECT threading algorithm is also referred to as
128	         "poor man's threading."  The searched messages are sorted by
129	         subject and then by sent date, equivalent to a "SORT (SUBJECT
130	         DATE)".  The messages are then split into separate threads,
131	         with each thread containing messages with the same extracted
132	         subject text.  Finally, the threads are sorted by the sent date
133	         of the first message in the thread.

135	         Note that each message in a thread is a child (as opposed to a
136	         sibling) of the previous message.

138	      REFERENCES
139	         The REFERENCES threading algorithm is based on the algorithm
140	         written by Jamie Zawinski which was used in "Netscape Mail and
141	         News" versions 2.0 through 3.0.  For details, see
142	         http://www.jwz.org/docs/threading.html.

144	         This algorithm threads the searched messages by grouping them
145	         together in parent/child relationships based on which messages
146	         are replies to others.  The parent/child relationships are
147	         built using two methods: reconstructing a message's ancestry
148	         using the references contained within it; and checking the
149	         subject of a message to see if it is a reply to (or forward of)
150	         another.

152	         The references used for reconstructing a message's ancestry are
153	         found using the following rules:

155	            If a message contains a [NEWS]-style References header line,
156	            then use the Message IDs in the References header line as
157	            the references.

159	            If a message does not contain a References header line, or
160	            the References header line does not contain any valid
161	            Message IDs, then use the first (if any) valid Message ID
162	            found in the In-Reply-To header line as the only reference
163	            (parent) for this message.

165	               NOTE: Although RFC 822 permits multiple Message IDs in
166	               the In-Reply-To header, in actual practice this
167	               discipline has not been followed.  For example, In-
168	               Reply-To headers have been observed with email addresses
169	               after the Message ID, and there are no good heuristics
170	               for software to determine the difference.  This is not a
171	               problem with the References header however.

173	            If a message does not contain an In-Reply-To header line, or
174	            the In-Reply-To header line does not contain a valid Message
175	            ID, then the message does not have any references (NIL).

177	         The REFERENCES algorithm is significantly more complex than
178	         ORDEREDSUBJECT and consists of five main steps.  These steps
179	         are outlined in detail below.

181	         (1) For each searched message:

183	            (A) Using the Message IDs in the message's references, link
184	            the corresponding messages (those whose Message-ID header
185	            line contains the given reference Message ID) together as
186	            parent/child.  Make the first reference the parent of the
187	            second (and the second a child of the first), the second the
188	            parent of the third (and the third a child of the second),
189	            etc.  The following rules govern the creation of these
190	            links:

192	               If a message does not contain a Message-ID header line,
193	               or the Message-ID header line does not contain a valid
194	               Message ID, then assign a unique Message ID to this
195	               message.

197	               If two or more messages have the same Message ID, assign
198	               a unique Message ID to each of the duplicates.

200	               If no message can be found with a given Message ID,
201	               create a dummy message with this ID.  Use this dummy
202	               message for all subsequent references to this ID.

204	               If a message already has a parent, don't change the
205	               existing link.  This is done because the References
206	               header line may have been truncated by a MUA.  As a
207	               result, there is no guarantee that the messages
208	               corresponding to adjacent Message IDs in the References
209	               header line are parent and child.

211	               Do not create a parent/child link if creating that link
212	               would introduce a loop.  For example, before making
213	               message A the parent of B, make sure that A is not a
214	               descendent of B.

216	            (B) Create a parent/child link between the last reference
217	            (or NIL if there are no references) and the current message.
218	            If the current message already has a parent, it is probably
219	            the result of a truncated References header line, so break
220	            the current parent/child link before creating the new
221	            correct one.  As in step 1.A, do not create the parent/child
222	            link if creating that link would introduce a loop.  Note
223	            that if this message has no references, that it will now
224	            have no parent.

226	               NOTE: The parent/child links created in steps 1.A and 1.B
227	               MUST be kept consistent with one another at ALL times.

229	         (2) Gather together all of the messages that have no parents
230	         and make them all children (siblings of one another) of a dummy
231	         parent (the "root").  These messages constitute the first
232	         (head) message of the threads created thus far.

234	         (3) Prune dummy messages from the thread tree.  Traverse each
235	         thread under the root, and for each message:

237	            If it is a dummy message with NO children, delete it.

239	            If it is a dummy message with children, delete it, but
240	            promote its children to the current level.  In other words,
241	            splice them in with the dummy's siblings.

243	            Do not promote the children if doing so would make them
244	            children of the root, unless there is only one child.

246	         (4) Gather together messages under the root that have the same
247	         extracted subject text.

249	            (A) Create a table for associating extracted subjects with
250	            messages.

252	            (B) Populate the subject table with one message per
253	            extracted subject.  For each child of the root:

255	               (i) Find the subject of this thread by extracting the
256	               base subject from the current message, or its first child
257	               if the current message is a dummy.

259	               (ii) If the extracted subject is empty, skip this
260	               message.

262	               (iii) Lookup the message associated with this extracted
263	               subject in the table.

265	               (iv) If there is no message in the table with this
266	               subject, add the current message and the extracted
267	               subject to the subject table.

269	               Otherwise, replace the message in the table with the
270	               current message if the message in the table is not a
271	               dummy AND either of the following criteria are true:

273	                  The current message is a dummy, OR

275	                  The message in the table is a reply or forward (its
276	                  original subject contains a subj-refwd part and/or a
277	                  "(fwd)" subj-trailer) and the current message is not.

279	            (C) Merge threads with the same subject.  For each child of
280	            the root:

282	               (i) Find the subject of this thread as in step 4.B.i
283	               above.

285	               (ii) If the extracted subject is empty, skip this
286	               message.

288	               (iii) Lookup the message associated with this extracted
289	               subject in the table.

291	               (iv) If the message in the table is the current message,
292	               skip this message.

294	               Otherwise, merge the current message with the one in the
295	               table using the following rules:

297	                  If both messages are dummies, append the current
298	                  message's children to the children of the message in
299	                  the table (the children of both messages become
300	                  siblings), and then delete the current message.

302	                  If the message in the table is a dummy and the current
303	                  message is not, make the current message a child of
304	                  the message in the table (a sibling of it's children).

306	                  If the current message is a reply or forward and the
307	                  message in the table is not, make the current message
308	                  a child of the message in the table (a sibling of it's
309	                  children).

311	                  Otherwise, create a new dummy message and make both
312	                  the current message and the message in the table
313	                  children of the dummy.  Then replace the message in
314	                  the table with the dummy message.

316	         (5) Traverse the messages under the root and sort each set of
317	         siblings by date.  Traverse the messages in such a way that the
318	         "youngest" set of siblings are sorted first, and the "oldest"
319	         set of siblings are sorted last (grandchildren are sorted
320	         before children, etc).  In the case of an exact match on date,
321	         use the order in which the messages appear in the mailbox (that
322	         is, by sequence number) to determine the order.  In the case of
323	         a dummy message (which can only occur with top-level siblings),
324	         use its first child for sorting.

326	   Example:    C: A283 THREAD ORDEREDSUBJECT UTF-8 SINCE 5-MAR-2000
327	               S: * THREAD (166)(167)(168)(169)(172)(170)(171)
328	                  (173)(174 175 176 178 181 180)(179)(177 183
329	                   182 188 184 185 186 187 189)(190)(191)(192)
330	                  (193)(194 195)(196 197 198)(199)(200 202)(201)
331	                  (203)(204)(205)(206 207)(208)
332	               S: A283 OK THREAD completed
333	               C: A284 THREAD ORDEREDSUBJECT US-ASCII TEXT "gewp"
334	               S: * THREAD
335	               S: A284 OK THREAD completed
336	               C: A285 THREAD REFERENCES UTF-8 SINCE 5-MAR-2000
337	               S: * THREAD (166)(167)(168)(169)(172)((170)(179))
338	                  (171)(173)((174)(175)(176)(178)(181)(180))
339	                  ((177)(183)(182)(188 (184)(189))(185 186)(187))
340	                  (190)(191)(192)(193)((194)(195 196))(197 198)
341	                  (199)(200 202)(201)(203)(204)(205 206 207)(208)
342	               S: A285 OK THREAD completed

344	        Note: The line breaks in the first and third client
345	        responses are for editorial clarity and do not appear in
346	        real THREAD responses.

348	Additional Responses

350	   This response is an extension to the IMAP4rev1 base protocol.

352	   The section heading of this response is intended to correspond with
353	   where it would be located in the main document.

355	7.2.THREAD.     THREAD Response

357	   Data:       zero or more threads

359	      The THREAD response occurs as a result of a THREAD or UID THREAD
360	      command.  It contains zero or more threads.  A thread consists of
361	      a parenthesized list of thread members.

363	      Thread members consist of zero or more message numbers, delimited
364	      by spaces, indicating successive parent and child.  This continues
365	      until the thread splits into multiple sub-threads, at which point
366	      the thread nests into multiple sub-threads with the first member
367	      of each subthread being siblings at this level.  There is no limit
368	      to the nesting of threads.

370	      The messages numbers refer to those messages that match the search
371	      criteria.  For THREAD, these are message sequence numbers; for UID
372	      THREAD, these are unique identifiers.

374	   Example:    S: * THREAD (2)(3 6 (4 23)(44 7 96))

376	      The first thread consists only of message 2.  The second thread
377	      consists of the messages 3 (parent) and 6 (child), after which it
378	      splits into two subthreads; the first of which contains messages 4
379	      (child of 6, sibling of 44) and 23 (child of 4), and the second of
380	      which contains messages 44 (child of 6, sibling of 4), 7 (child of
381	      44), and 96 (child of 7).  Since some later messages are parents
382	      of earlier messages, the messages were probably moved from some
383	      other mailbox at different times.

385	      -- 2

387	      -- 3
388	         \-- 6
389	             |-- 4
390	             |   \-- 23
391	             |
392	             \-- 44
393	                  \-- 7
394	                      \-- 96

396	   Example:    S: * THREAD ((3)(5))

398	      In this example, 3 and 5 are siblings of a parent which does not
399	      match the search critieria (and/or does not exist in the mailbox);
400	      however they are members of the same thread.

402	Formal Syntax of THREAD commands and Responses

404	   thread-data       = "THREAD" [SP 1*thread-list]

406	   thread-list       = "(" thread-members / thread-nested ")"

408	   thread-members    = nz-number *(SP nz-number) [SP thread-nested]

410	   thread-nested     = 2*thread-list

412	   thread            = ["UID" SP] "THREAD" SP thread-algorthm
413	                       SP search-charset 1*(SP search-key)

415	   thread-algorithm  = "ORDEREDSUBJECT" / "REFERENCES" / atom

417	   The following syntax describes subject extraction rules (2)-(6):

419	   subject         = *subj-leader [subj-middle] *subj-trailer

421	   subj-refwd      = ("re" / ("fw" ["d"])) *WSP [subj-blob] ":"

423	   subj-blob       = "[" *BLOBCHAR "]" *WSP

425	   subj-fwd        = subj-fwd-hdr subject subj-fwd-trl

427	   subj-fwd-hdr    = "[fwd:"

429	   subj-fwd-trl    = "]"

431	   subj-leader     = (*subj-blob subj-refwd) / WSP

433	   subj-middle     = *subj-blob (subj-base / subj-fwd)
434	                   ; last subj-blob is subj-base if subj-base would
435	                   ; otherwise be empty

437	   subj-trailer    = "(fwd)" / WSP

439	   subj-base       = NONWSP *([*WSP] NONWSP)
440	                   ; can be a subj-blob

442	   BLOBCHAR        = %x01-5a / %x5c / %x5e-7f
443	                   ; any CHAR except '[' and ']'

445	   NONWSP          = %x01-08 / %x0a-1f / %x21-7f
446	                   ; any CHAR other than WSP

448	Security Considerations

450	   Security issues are not discussed in this memo.

452	Internationalization Considerations

454	   By default, strings are threaded according to the "minimum sorting
455	   collation algorithm".  All implementations of THREAD MUST implement
456	   the minimum sorting collation algorithm.

458	   In the minimum sorting collation algorithm, the Basic Latin
459	   alphabetics (U+0041 to U+005A uppercase, U+0061 to U+007A lowercase)
460	   are sorted in a case-insensitive fashion; that is, "A" (U+0041) and
461	   "a" (U+0061) are treated as exact equals.  The characters U+005B to
462	   U+0060 are sorted after the Basic Latin alphabetics; for example,
463	   U+005E is sorted after U+005A and U+007A.  All other characters are
464	   sorted according to their octet values, as expressed in UTF-8.  No
465	   attempt is made to treat composed characters specially, or to do
466	   case-insensitive comparisons of composed characters.

468	        Note: this means, among other things, that the composed
469	        characters in the Latin-1 Supplement are not compared in
470	        what would be considered an ISO 8859-1 "case-insensitive"
471	        fashion.  Case comparison rules for characters with
472	        diacriticals differ between languages; the minimum sorting
473	        collation does not attempt to deal with this at all.  This
474	        is reserved for other sorting collations, which may be
475	        language-specific.

477	   Other sorting collations, and the ability to change the sorting
478	   collation, will be defined in a separate document dealing with IMAP
479	   internationalization.

481	   It is anticipated that there will be a generic Unicode sorting
482	   collation, which will provide generic case-insensitivity for
483	   alphabetic scripts, specification of composed character handling, and
484	   language-specific sorting collations.  A server which implements
485	   non-default sorting collations will modify its sorting behavior
486	   according to the selected sorting collation.

488	   Non-English translations of "Re" or "Fw"/"Fwd" are not specified for
489	   removal in the extracted subject text process.  By specifying that
490	   only the English forms of the prefixes are used, it becomes a simple
491	   display time task to localize the prefix language for the user.  If,
492	   on the other hand, prefixes in multiple languages are permitted, the
493	   result is a geometrically complex, and ultimately unimplementable,
494	   task.  In order to improve the ability to support non-English display
495	   in Internet mail clients, only the English form of these prefixes
496	   should be transmitted in Internet mail messages.

498	A.      References

500	   [ABNF] Crocker, D., and Overell, P. "Augmented BNF for Syntax
501	   Specifications: ABNF", RFC 2234, November 1997.

503	   [NEWS] Horton, M., and Adams, R., "Standard for interchange of USENET
504	   messages", RFC-1036, AT&T Bell Laboratories and Center for Seismic
505	   Studies, December, 1987.
506	Author's Address

508	   Mark R. Crispin
509	   Networks and Distributed Computing
510	   University of Washington
511	   4545 15th Avenue NE
512	   Seattle, WA  98105-4527

514	   Phone: (206) 543-5762

516	   EMail: MRC@CAC.Washington.EDU

518	   Kenneth Murchison
519	   Oceana Matrix Ltd.
520	   21 Princeton Place
521	   Orchard Park, NY 14127

523	   Phone: (716) 662-8973 x26

525	   EMail: ken@oceana.com