RE: UID SEARCH responses

Mark Crispin <mrc@CAC.Washington.EDU> Mon, 13 August 2007 03:21 UTC

Received: from balder-227.proper.com (localhost [127.0.0.1]) by balder-227.proper.com (8.13.5/8.13.5) with ESMTP id l7D3LQgl042113 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 12 Aug 2007 20:21:26 -0700 (MST) (envelope-from owner-ietf-imapext@mail.imc.org)
Received: (from majordom@localhost) by balder-227.proper.com (8.13.5/8.13.5/Submit) id l7D3LQTm042112; Sun, 12 Aug 2007 20:21:26 -0700 (MST) (envelope-from owner-ietf-imapext@mail.imc.org)
X-Authentication-Warning: balder-227.proper.com: majordom set sender to owner-ietf-imapext@mail.imc.org using -f
Received: from mxout3.cac.washington.edu (mxout3.cac.washington.edu [140.142.32.166]) by balder-227.proper.com (8.13.5/8.13.5) with ESMTP id l7D3LPNa042106 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL) for <ietf-imapext@imc.org>; Sun, 12 Aug 2007 20:21:25 -0700 (MST) (envelope-from mrc@CAC.Washington.EDU)
Received: from smtp.washington.edu (smtp.washington.edu [140.142.33.7] (may be forged)) by mxout3.cac.washington.edu (8.13.7+UW06.06/8.13.7+UW07.06) with ESMTP id l7D3LOk6022068 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sun, 12 Aug 2007 20:21:24 -0700
X-Auth-Received: from pangtzu.panda.com (pangtzu.panda.com [206.124.149.117]) (authenticated authid=mrc) by smtp.washington.edu (8.13.7+UW06.06/8.13.7+UW07.03) with ESMTP id l7D3LGce032604 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Sun, 12 Aug 2007 20:21:23 -0700
Date: Sun, 12 Aug 2007 20:21:16 -0700
From: Mark Crispin <mrc@CAC.Washington.EDU>
To: Peter Coates <peter.coates@Sun.COM>
cc: 'Pawel Salek' <pawsa@theochem.kth.se>, ietf-imapext@imc.org
Subject: RE: UID SEARCH responses
In-Reply-To: <105201c7dd4f$4e206810$ea613830$%coates@sun.com>
Message-ID: <alpine.OSX.0.999.0708121919050.14725@pangtzu.panda.com>
References: <1186951077l.3538l.2l@nora.salek.zapto.org> <alpine.OSX.0.999.0708121340010.14725@pangtzu.panda.com> <105201c7dd4f$4e206810$ea613830$%coates@sun.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset="us-ascii"; format="flowed"
X-PMX-Version: 5.3.3.310218, Antispam-Engine: 2.5.1.298604, Antispam-Data: 2007.8.12.195723
X-Uwash-Spam: Gauge=XI, Probability=11%, Report='ADULT_MED_1 1, __ADULT_ANY 0, __ADULT_PHRASE_8_LO 0, __CP_URI_IN_BODY 0, __CT 0, __CT_TEXT_PLAIN 0, __HAS_MSGID 0, __MIME_TEXT_ONLY 0, __MIME_VERSION 0, __RUS_SUBJ_ALL_UCASE_1251 0, __RUS_SUBJ_ALL_UCASE_KOI8R 0, __SANE_MSGID 0'
Sender: owner-ietf-imapext@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-imapext/mail-archive/>
List-ID: <ietf-imapext.imc.org>
List-Unsubscribe: <mailto:ietf-imapext-request@imc.org?body=unsubscribe>

On Sun, 12 Aug 2007, Peter Coates wrote:
> Personally I think that having a search result that does not stand alone (as
> in sends a range when some messages in the range do not exist) is bogus.

Why do you think that it does not "stand alone"?  The returned 
sequence-set is explicitly and unambiguously valid as a sequence-set to 
any other IMAP command and will absolutely return to the matching messages 
and no others.

By "stand alone", do you mean "mathematically calculate the UID/msgno 
map"?  If so, where in RFC 4731 is this property stated?

> The results that are returned as a result of a UID SEARCH RETURN (UPDATE)
> command explicitly preclude sending a range (eg 1:4) to mean 1,4 even if 2
> and 3 do not exist.

Where does it "explicitly" state that in RFC 4731?

The syntax for SEARCH results is zero or more space delimited message 
numbers or UIDs.  Note that this is a different syntax from sequence-set. 
Not only does it not have ranges, it also uses space instead of comma for 
the delimiter.

The syntax for ESEARCH results is the RFC 3501 sequence-set rule, which 
has comma-delimited message numbers, UIDs, or ranges (a range being a pair 
of colon delimited message numbers or UIDs).  It is not unreasonable to 
believe that the use of the same syntax indicates the same semantics.

The definition of sequence-set in RFC 3501 explicitly allows UID ranges to 
be used even when there are holes in the UID space.  Nothing in RFC 4731 
states that its definition of sequence-set is different from RFC 3501. 
In fact, there is text in RFC 4731 that clearly states that the 
sequence-set returned from ESEARCH is intended to be used as an RFC 3501 
sequence-set.

Nowhere in RFC 4731 does it say that an RFC 4731 sequence-set has new 
semantics that prohibit the server from returning ranges that encompass 
holes in monotonically assigned UID space.

If you assert that ESEARCH sequence-set has such semantics, and MUST NOT 
send ranges that have holes, then in effect ESEARCH:
  . creates a new type of sequence-set that is different from all other
    IMAP use of sequence-set
  . forces all ESEARCH results to have longer, more bandwidth-consuming,
    results even if the client has the UID map and has no need for such
    longer results
  . forces ESEARCH results to ALWAYS be longer than SEARCH results in any
    server that does not monotonically assign UID values.
  . as a consequence of the previous statement creates an implicit
    requirement, stated nowhere in any specification, that UIDs must be
    monotonically assigned in a mailbox.

I assert that ESEARCH sequence-set has no such semantics, and is:
  . compatible with RFC 3501
  . minimizes the results so that only non-matching messages cause there
    to be multiple entries in the sequence-set
  . does so even with servers that do not monotonically assign UID values
  . as a consequence of the previous states create no new implicit
    requirements

This does mean that ESEARCH can't be used to get the UID/msgno map as 
Pawel complains.  But there are two other commands in RFC 3501 that can 
accomplish that: FETCH and UID SEARCH.  FETCH has always been the official 
way.

It is a fortunate accident that UID SEARCH returns the map as well.  But 
how many applications that use UID SEARCH to obtain the UID/msgno map can 
handle:
 	tag UID SEARCH ALL
 	* SEARCH 4929 3012
 	* SEARCH 12 9993
 	tag OK done
Such a scenario is completely valid in RFC 3501!  If your client can't 
handle that, it's already broken.  Hence that strange warning in RFC 4731:
          [...]  Note, the client
          MUST NOT assume that messages/UIDs will be listed in any
          particular order.

> I think it is unwise for a server knowingly to emit responses that are
> likely to cause clients confusion no matter what the legalistic reading of
> the RFC might be.

I will remember that statement the next time someone claims that 
implementations must MIME-process message headers that lack a 
MIME-Version: and accept overlong MIME encoded-word tokens in a header.

Think of the effect of what you are saying.  For the benefit of clients 
that are not satisfied with having TWO ways of getting the UID/msgno map, 
IMAP must have THREE ways of getting the UID/msgno map; and to provide 
that third way, an advertised protocol optimization MUST be reduced and in 
some cases turned into a chattiness pessimization!  Given:
 	x01 UID SEARCH ALL
 	* SEARCH 12 3012 4929 5948 6666 7739 8199 8839 9017 9993
 	x01 OK done
Compare:
 	x02 UID SEARCH RETURN () ALL
 	* ESEARCH (TAG x02) ALL 12 3012 4929 5948 6666 7739 8199 8839 9017 9993
 	x02 OK done
with:
 	x02 UID SEARCH RETURN () ALL
 	* ESEARCH (TAG x02) ALL 12:9993
 	x02 OK done

Furthermore, this pessimization only happens with UIDs.  Message numbers 
won't get pessimized:
 	x03 SEARCH RETURN () ALL
 	* ESEARCH (TAG x02) ALL 1:10
 	x03 OK done

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.