[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [MORG] Why IMAP extensions are (not) used
Barry Leiba writes:
Locking the mailbox in order to compute RECENT isn't free for
anyone. Scanning for UNSEEN and EXISTS isn't free for anyone. Many
databases and libraries that will use indexes to search quickly are
available to everyone. Maybe the query planner I use is better than
most. So what?
If you have (say) a DB2 back-end to your mail store, a search of 350
mailboxes might be very efficient.
Yes. That's my experience. I was surprised by how fast it was, to tell
the truth.
If you have (say) a set of Unix mailboxes as the back-end to your mail
store, a search of 350 mailboxes might be pretty much the same,
whether you do it in one command or 700, apart from the protocol
overhead.
And because I was surprised by the speed, I did some analysis to find
out what made the difference. It's not just «db good, rest of world
bad».
If that mbox server uses no indexing, then you're right.
If it uses lucene (IIRC that's what Timo mentioned using it for his mbox
store) for each mailbox, then the story will be different. In that case
you may see that SELECT is as much effort as the actual searching for
some important searches (and not much improvement for some other
searches) and that RTTs dominate the user experience. It depends on the
search expression, of course.
If it uses lucene, but has a per-user index instead of a per-mailbox
one, then the best-case improvement is even bigger, particularly when
the results are in few mailboxes.
Using a DB2-like backend will boost multmailbox searches for more and
more varied queries than just a lucene-like index. But the lucene-like
index will be really fast for an important subset of queries (such as a
simple SEARCH BODY x for all the user's mailboxes).
Mark's point is that for a lot of these sorts of things, you kind of
need to know something about the server to know whether doing [X] is
a good idea or not.
Yes... and I have a feeling that time spent implementing and analysing
aids knowledge.
Perhaps one might say that a server SHOULD only implement multimailbox
search if it has the sort of back-end that makes it significantly
more efficient. Or perhaps there should also be some sort of clue to
the client as to the efficiency.
I think not: The efficiency is difficult to describe concisely, and
people will leanr from experience which buttons are best not pressed
when the db is big.
Or perhaps clients and servers should do what they like, and the
market will sort out which ones match users' expectations best.
Whatever the decision, Mark's point, which is correct, is that you
can't make assumptions. A client that wants to let a user search 350
mailboxes will do it, one way or another. But it might or might not
perform badly with or without the extension, and you can't know in
advance which it'll be.
But you can write the code, run some searches, analyse the logfiles and
then you'll know quite a bit.
This is a digression. I'll stop now.
Arnt