idnits 2.17.1 

draft-ietf-imapext-thread-06.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 13 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an Introduction section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The abstract seems to contain references ([ABNF], [NEWS]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 82: '...onnected clients MUST use exactly this...'
     RFC 2119 keyword, line 234: '...               MUST be kept consistent...'
     RFC 2119 keyword, line 476: '...plementations of THREAD MUST implement...'


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (December 2000) is 8531 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Missing reference section? 'ABNF' on line 521 looks like a reference

  -- Missing reference section? 'NEWS' on line 524 looks like a reference


     Summary: 7 errors (**), 0 flaws (~~), 2 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	IMAP Extensions Working Group                                 M. Crispin
3	Internet Draft: IMAP THREAD                                 K. Murchison
4	Document: internet-drafts/draft-ietf-imapext-thread-06.txt December 2000

6	          INTERNET MESSAGE ACCESS PROTOCOL - THREAD EXTENSION

8	Status of this Memo

10	   This document is an Internet-Draft and is in full conformance with
11	   all provisions of Section 10 of RFC 2026.

13	   Internet-Drafts are working documents of the Internet Engineering
14	   Task Force (IETF), its areas, and its working groups.  Note that
15	   other groups may also distribute working documents as Internet-
16	   Drafts.

18	   Internet-Drafts are draft documents valid for a maximum of six months
19	   and may be updated, replaced, or obsoleted by other documents at any
20	   time.  It is inappropriate to use Internet-Drafts as reference
21	   material or to cite them other than as "work in progress."

23	   The list of current Internet-Drafts can be accessed at
24	   http://www.ietf.org/ietf/1id-abstracts.txt

26	   To view the list Internet-Draft Shadow Directories, see
27	   http://www.ietf.org/shadow.html.

29	   A revised version of this draft document will be submitted to the RFC
30	   editor as a Proposed Standard for the Internet Community.

32	   Discussion and suggestions for improvement are requested, and should
33	   be sent to ietf-imapext@IMC.ORG.  This document will expire before 27
34	   June 2001.  Distribution of this memo is unlimited.

36	Abstract

38	   This document describes the server-based threading extension to the
39	   IMAP4rev1 protocol.  This extension provides substantial performance
40	   improvements for IMAP clients which offer threaded views.

42	   A server which supports this extension indicates this with one or
43	   more capability names consisting of "THREAD=" followed by a supported
44	   threading algorithm name as described in this document.  This
45	   provides for future upwards-compatible extensions.

47	Extracted Subject Text

49	   Threading uses a version of the subject which has specific subject
50	   artifacts of deployed Internet mail software removed.  Due to the
51	   complexity of these artifacts, the formal syntax for the subject
52	   extraction rules is ambiguous.  The following procedure is followed
53	   to determine the actual "base subject" which is used to thread:

55	        (1) Convert any RFC 2047 encoded-words in the subject to
56	        UTF-8.  Convert all tabs and continuations to space.
57	        Convert all multiple spaces to a single space.

59	        (2) Remove all trailing text of the subject that matches
60	        the subj-trailer ABNF, repeat until no more matches are
61	        possible.

63	        (3) Remove all prefix text of the subject that matches the
64	        subj-leader ABNF.

66	        (4) If there is prefix text of the subject that matches the
67	        subj-blob ABNF, and removing that prefix leaves a non-empty
68	        subj-base, then remove the prefix text.

70	        (5) Repeat (3) and (4) until no matches remain.

72	           Note: it is possible to defer step (2) until step (6),
73	           but this requires checking for subj-trailer in step (4).

75	        (6) If the resulting text begins with the subj-fwd-hdr ABNF
76	        and ends with the subj-fwd-trl ABNF, remove the
77	        subj-fwd-hdr and subj-fwd-trl and repeat from step (2).

79	        (7) The resulting text is the "base subject" used in
80	        threading.

82	   All servers and disconnected clients MUST use exactly this algorithm
83	   when threading.  Otherwise there is potential for a user to get
84	   inconsistent results based on whether they are running in connected
85	   or disconnected IMAP mode.

87	Sent Date

89	   As used in this document, the term "sent date" refers to the date and
90	   time from the Date: header, adjusted by time zone.  This differs from
91	   date-related criteria in SEARCH, which use just the date and not the
92	   time, nor adjusts by time zone.

94	Additional Commands

96	   This command is an extension to the IMAP4rev1 base protocol.

98	   The section header is intended to correspond with where it would be
99	   located in the main document if it was part of the base
100	   specification.

102	6.3.THREAD.     THREAD Command

104	   Arguments:  threading algorithm
105	               charset specification
106	               searching criteria (one or more)

108	   Data:       untagged responses: THREAD

110	   Result:     OK - thread completed
111	               NO - thread error: can't thread that charset or
112	                    criteria
113	               BAD - command unknown or arguments invalid

115	      The THREAD command is a variant of SEARCH with threading semantics
116	      for the results.  Thread has two arguments before the searching
117	      criteria argument; a threading algorithm, and the searching
118	      charset.  Note that unlike SEARCH, the searching charset argument
119	      is mandatory.

121	      There is also a UID THREAD command which corresponds to THREAD the
122	      way that UID SEARCH corresponds to SEARCH.

124	      The THREAD command first searches the mailbox for messages that
125	      match the given searching criteria using the charset argument for
126	      the interpretation of strings in the searching criteria.  It then
127	      returns the matching messages in an untagged THREAD response,
128	      threaded according to the specified threading algorithm.

130	      The defined threading algorithms are as follows:

132	      ORDEREDSUBJECT
133	         The ORDEREDSUBJECT threading algorithm is also referred to as
134	         "poor man's threading."  The searched messages are sorted by
135	         subject and then by the sent date.  The messages are then split
136	         into separate threads, with each thread containing messages
137	         with the same extracted subject text.  Finally, the threads are
138	         sorted by the sent date of the first message in the thread.

140	         Note that each message in a thread is a child (as opposed to a
141	         sibling) of the previous message.

143	      REFERENCES
144	         The REFERENCES threading algorithm is based on the algorithm
145	         written by Jamie Zawinski which was used in "Netscape Mail and
146	         News" versions 2.0 through 3.0.  For details, see
147	         http://www.jwz.org/doc/threading.html.

149	         This algorithm threads the searched messages by grouping them
150	         together in parent/child relationships based on which messages
151	         are replies to others.  The parent/child relationships are
152	         built using two methods: reconstructing a message's ancestry
153	         using the references contained within it; and checking the
154	         subject of a message to see if it is a reply to (or forward of)
155	         another.

157	         The references used for reconstructing a message's ancestry are
158	         found using the following rules:

160	            If a message contains a [NEWS]-style References header line,
161	            then use the Message IDs in the References header line as
162	            the references.

164	            If a message does not contain a References header line, or
165	            the References header line does not contain any valid
166	            Message IDs, then use the first (if any) valid Message ID
167	            found in the In-Reply-To header line as the only reference
168	            (parent) for this message.

170	               Note: Although RFC 822 permits multiple Message IDs in
171	               the In-Reply-To header, in actual practice this
172	               discipline has not been followed.  For example,
173	               In-Reply-To headers have been observed with email
174	               addresses after the Message ID, and there are no good
175	               heuristics for software to determine the difference.
176	               This is not a problem with the References header however.

178	            If a message does not contain an In-Reply-To header line, or
179	            the In-Reply-To header line does not contain a valid Message
180	            ID, then the message does not have any references (NIL).

182	         The REFERENCES algorithm is significantly more complex than
183	         ORDEREDSUBJECT and consists of six main steps.  These steps are
184	         outlined in detail below.

186	         (1) For each searched message:

188	            (A) Using the Message IDs in the message's references, link
189	            the corresponding messages (those whose Message-ID header
190	            line contains the given reference Message ID) together as
191	            parent/child.  Make the first reference the parent of the
192	            second (and the second a child of the first), the second the
193	            parent of the third (and the third a child of the second),
194	            etc.  The following rules govern the creation of these
195	            links:

197	               If a message does not contain a Message-ID header line,
198	               or the Message-ID header line does not contain a valid
199	               Message ID, then assign a unique Message ID to this
200	               message.

202	               If two or more messages have the same Message ID, assign
203	               a unique Message ID to each of the duplicates.

205	               If no message can be found with a given Message ID,
206	               create a dummy message with this ID.  Use this dummy
207	               message for all subsequent references to this ID.

209	               If a message already has a parent, don't change the
210	               existing link.  This is done because the References
211	               header line may have been truncated by a MUA.  As a
212	               result, there is no guarantee that the messages
213	               corresponding to adjacent Message IDs in the References
214	               header line are parent and child.

216	               Do not create a parent/child link if creating that link
217	               would introduce a loop.  For example, before making
218	               message A the parent of B, make sure that A is not a
219	               descendent of B.

221	                  Note: Message ID comparisons are case-sensitive.

223	            (B) Create a parent/child link between the last reference
224	            (or NIL if there are no references) and the current message.
225	            If the current message already has a parent, it is probably
226	            the result of a truncated References header line, so break
227	            the current parent/child link before creating the new
228	            correct one.  As in step 1.A, do not create the parent/child
229	            link if creating that link would introduce a loop.  Note
230	            that if this message has no references, that it will now
231	            have no parent.

233	               Note: The parent/child links created in steps 1.A and 1.B
234	               MUST be kept consistent with one another at ALL times.

236	         (2) Gather together all of the messages that have no parents
237	         and make them all children (siblings of one another) of a dummy
238	         parent (the "root").  These messages constitute the first
239	         (head) message of the threads created thus far.

241	         (3) Prune dummy messages from the thread tree.  Traverse each
242	         thread under the root, and for each message:

244	            If it is a dummy message with NO children, delete it.

246	            If it is a dummy message with children, delete it, but
247	            promote its children to the current level.  In other words,
248	            splice them in with the dummy's siblings.

250	            Do not promote the children if doing so would make them
251	            children of the root, unless there is only one child.

253	         (4) Sort the messages under the root (top-level siblings only)
254	         by sent date.  In the case of an exact match on sent date or if
255	         either of the Date: headers used in a comparison can not be
256	         parsed, use the order in which the messages appear in the
257	         mailbox (that is, by sequence number) to determine the order.
258	         In the case of a dummy message, sort its children by sent date
259	         and then use the first child for the top-level sort.

261	         (5) Gather together messages under the root that have the same
262	         extracted subject text.

264	            (A) Create a table for associating extracted subjects with
265	            messages.

267	            (B) Populate the subject table with one message per
268	            extracted subject.  For each child of the root:

270	               (i) Find the subject of this thread by extracting the
271	               base subject from the current message, or its first child
272	               if the current message is a dummy.

274	               (ii) If the extracted subject is empty, skip this
275	               message.

277	               (iii) Lookup the message associated with this extracted
278	               subject in the table.

280	               (iv) If there is no message in the table with this
281	               subject, add the current message and the extracted
282	               subject to the subject table.

284	               Otherwise, replace the message in the table with the
285	               current message if the message in the table is not a
286	               dummy AND either of the following criteria are true:

288	                  The current message is a dummy, OR

290	                  The message in the table is a reply or forward (its
291	                  original subject contains a subj-refwd part and/or a
292	                  "(fwd)" subj-trailer) and the current message is not.

294	            (C) Merge threads with the same subject.  For each child of
295	            the root:

297	               (i) Find the subject of this thread as in step 4.B.i
298	               above.

300	               (ii) If the extracted subject is empty, skip this
301	               message.

303	               (iii) Lookup the message associated with this extracted
304	               subject in the table.

306	               (iv) If the message in the table is the current message,
307	               skip this message.

309	               Otherwise, merge the current message with the one in the
310	               table using the following rules:

312	                  If both messages are dummies, append the current
313	                  message's children to the children of the message in
314	                  the table (the children of both messages become
315	                  siblings), and then delete the current message.

317	                  If the message in the table is a dummy and the current
318	                  message is not, make the current message a child of
319	                  the message in the table (a sibling of it's children).

321	                  If the current message is a reply or forward and the
322	                  message in the table is not, make the current message
323	                  a child of the message in the table (a sibling of it's
324	                  children).

326	                  Otherwise, create a new dummy message and make both
327	                  the current message and the message in the table
328	                  children of the dummy.  Then replace the message in
329	                  the table with the dummy message.

331	                     Note: Subject comparisons are case-insensitive, as
332	                     described under "Internationalization
333	                     Considerations."

335	         (6) Traverse the messages under the root and sort each set of
336	         siblings by sent date.  Traverse the messages in such a way
337	         that the "youngest" set of siblings are sorted first, and the
338	         "oldest" set of siblings are sorted last (grandchildren are
339	         sorted before children, etc).  In the case of an exact match on
340	         sent date or if either of the Date: headers used in a
341	         comparison can not be parsed, use the order in which the
342	         messages appear in the mailbox (that is, by sequence number) to
343	         determine the order.  In the case of a dummy message (which can
344	         only occur with top-level siblings), use its first child for
345	         sorting.

347	   Example:    C: A283 THREAD ORDEREDSUBJECT UTF-8 SINCE 5-MAR-2000
348	               S: * THREAD (166)(167)(168)(169)(172)(170)(171)
349	                  (173)(174 175 176 178 181 180)(179)(177 183
350	                   182 188 184 185 186 187 189)(190)(191)(192)
351	                  (193)(194 195)(196 197 198)(199)(200 202)(201)
352	                  (203)(204)(205)(206 207)(208)
353	               S: A283 OK THREAD completed
354	               C: A284 THREAD ORDEREDSUBJECT US-ASCII TEXT "gewp"
355	               S: * THREAD
356	               S: A284 OK THREAD completed
357	               C: A285 THREAD REFERENCES UTF-8 SINCE 5-MAR-2000
358	               S: * THREAD (166)(167)(168)(169)(172)((170)(179))
359	                  (171)(173)((174)(175)(176)(178)(181)(180))
360	                  ((177)(183)(182)(188 (184)(189))(185 186)(187))
361	                  (190)(191)(192)(193)((194)(195 196))(197 198)
362	                  (199)(200 202)(201)(203)(204)(205 206 207)(208)
363	               S: A285 OK THREAD completed

365	        Note: The line breaks in the first and third client
366	        responses are for editorial clarity and do not appear in
367	        real THREAD responses.

369	Additional Responses

371	   This response is an extension to the IMAP4rev1 base protocol.

373	   The section heading of this response is intended to correspond with
374	   where it would be located in the main document.

376	7.2.THREAD.     THREAD Response

378	   Data:       zero or more threads

380	      The THREAD response occurs as a result of a THREAD or UID THREAD
381	      command.  It contains zero or more threads.  A thread consists of
382	      a parenthesized list of thread members.

384	      Thread members consist of zero or more message numbers, delimited
385	      by spaces, indicating successive parent and child.  This continues
386	      until the thread splits into multiple sub-threads, at which point
387	      the thread nests into multiple sub-threads with the first member
388	      of each subthread being siblings at this level.  There is no limit
389	      to the nesting of threads.

391	      The messages numbers refer to those messages that match the search
392	      criteria.  For THREAD, these are message sequence numbers; for UID
393	      THREAD, these are unique identifiers.

395	   Example:    S: * THREAD (2)(3 6 (4 23)(44 7 96))

397	      The first thread consists only of message 2.  The second thread
398	      consists of the messages 3 (parent) and 6 (child), after which it
399	      splits into two subthreads; the first of which contains messages 4
400	      (child of 6, sibling of 44) and 23 (child of 4), and the second of
401	      which contains messages 44 (child of 6, sibling of 4), 7 (child of
402	      44), and 96 (child of 7).  Since some later messages are parents
403	      of earlier messages, the messages were probably moved from some
404	      other mailbox at different times.

406	      -- 2

408	      -- 3
409	         \-- 6
410	             |-- 4
411	             |   \-- 23
412	             |
413	             \-- 44
414	                  \-- 7
415	                      \-- 96

417	   Example:    S: * THREAD ((3)(5))

419	      In this example, 3 and 5 are siblings of a parent which does not
420	      match the search criteria (and/or does not exist in the mailbox);
421	      however they are members of the same thread.

423	Formal Syntax of THREAD commands and Responses

425	   thread-data       = "THREAD" [SP 1*thread-list]

427	   thread-list       = "(" thread-members / thread-nested ")"

429	   thread-members    = nz-number *(SP nz-number) [SP thread-nested]

431	   thread-nested     = 2*thread-list

433	   thread            = ["UID" SP] "THREAD" SP thread-algorithm
434	                       SP search-charset 1*(SP search-key)

436	   thread-algorithm  = "ORDEREDSUBJECT" / "REFERENCES" / atom

438	   The following syntax describes subject extraction rules (2)-(6):

440	   subject         = *subj-leader [subj-middle] *subj-trailer

442	   subj-refwd      = ("re" / ("fw" ["d"])) *WSP [subj-blob] ":"

444	   subj-blob       = "[" *BLOBCHAR "]" *WSP

446	   subj-fwd        = subj-fwd-hdr subject subj-fwd-trl

448	   subj-fwd-hdr    = "[fwd:"

450	   subj-fwd-trl    = "]"

452	   subj-leader     = (*subj-blob subj-refwd) / WSP

454	   subj-middle     = *subj-blob (subj-base / subj-fwd)
455	                   ; last subj-blob is subj-base if subj-base would
456	                   ; otherwise be empty

458	   subj-trailer    = "(fwd)" / WSP

460	   subj-base       = NONWSP *([*WSP] NONWSP)
461	                   ; can be a subj-blob

463	   BLOBCHAR        = %x01-5a / %x5c / %x5e-7f
464	                   ; any CHAR except '[' and ']'

466	   NONWSP          = %x01-08 / %x0a-1f / %x21-7f
467	                   ; any CHAR other than WSP

469	Security Considerations

471	   Security issues are not discussed in this memo.

473	Internationalization Considerations

475	   By default, strings are threaded according to the "minimum sorting
476	   collation algorithm".  All implementations of THREAD MUST implement
477	   the minimum sorting collation algorithm.

479	   In the minimum sorting collation algorithm, the Basic Latin
480	   alphabetics (U+0041 to U+005A uppercase, U+0061 to U+007A lowercase)
481	   are sorted in a case-insensitive fashion; that is, "A" (U+0041) and
482	   "a" (U+0061) are treated as exact equals.  The characters U+005B to
483	   U+0060 are sorted after the Basic Latin alphabetics; for example,
484	   U+005E is sorted after U+005A and U+007A.  All other characters are
485	   sorted according to their octet values, as expressed in UTF-8.  No
486	   attempt is made to treat composed characters specially, or to do
487	   case-insensitive comparisons of composed characters.

489	        Note: this means, among other things, that the composed
490	        characters in the Latin-1 Supplement are not compared in
491	        what would be considered an ISO 8859-1 "case-insensitive"
492	        fashion.  Case comparison rules for characters with
493	        diacriticals differ between languages; the minimum sorting
494	        collation does not attempt to deal with this at all.  This
495	        is reserved for other sorting collations, which may be
496	        language-specific.

498	   Other sorting collations, and the ability to change the sorting
499	   collation, will be defined in a separate document dealing with IMAP
500	   internationalization.

502	   It is anticipated that there will be a generic Unicode sorting
503	   collation, which will provide generic case-insensitivity for
504	   alphabetic scripts, specification of composed character handling, and
505	   language-specific sorting collations.  A server which implements
506	   non-default sorting collations will modify its sorting behavior
507	   according to the selected sorting collation.

509	   Non-English translations of "Re" or "Fw"/"Fwd" are not specified for
510	   removal in the extracted subject text process.  By specifying that
511	   only the English forms of the prefixes are used, it becomes a simple
512	   display time task to localize the prefix language for the user.  If,
513	   on the other hand, prefixes in multiple languages are permitted, the
514	   result is a geometrically complex, and ultimately unimplementable,
515	   task.  In order to improve the ability to support non-English display
516	   in Internet mail clients, only the English form of these prefixes
517	   should be transmitted in Internet mail messages.

519	A.      References

521	   [ABNF] Crocker, D., and Overell, P. "Augmented BNF for Syntax
522	   Specifications: ABNF", RFC 2234, November 1997.

524	   [NEWS] Horton, M., and Adams, R., "Standard for interchange of USENET
525	   messages", RFC-1036, AT&T Bell Laboratories and Center for Seismic
526	   Studies, December, 1987.

528	Author's Address

530	   Mark R. Crispin
531	   Networks and Distributed Computing
532	   University of Washington
533	   4545 15th Avenue NE
534	   Seattle, WA  98105-4527

536	   Phone: (206) 543-5762

538	   EMail: MRC@CAC.Washington.EDU

540	   Kenneth Murchison
541	   Oceana Matrix Ltd.
542	   21 Princeton Place
543	   Orchard Park, NY 14127

545	   Phone: (716) 662-8973 x26

547	   EMail: ken@oceana.com