[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MORG] Why IMAP extensions are (not) used



Barry Leiba writes:
Locking the mailbox in order to compute RECENT isn't free for anyone. Scanning for UNSEEN and EXISTS isn't free for anyone. Many databases and libraries that will use indexes to search quickly are available to everyone. Maybe the query planner I use is better than most. So what?

If you have (say) a DB2 back-end to your mail store, a search of 350 mailboxes might be very efficient.

Yes. That's my experience. I was surprised by how fast it was, to tell the truth.

If you have (say) a set of Unix mailboxes as the back-end to your mail store, a search of 350 mailboxes might be pretty much the same, whether you do it in one command or 700, apart from the protocol overhead.

And because I was surprised by the speed, I did some analysis to find out what made the difference. It's not just «db good, rest of world bad».

If that mbox server uses no indexing, then you're right.

If it uses lucene (IIRC that's what Timo mentioned using it for his mbox store) for each mailbox, then the story will be different. In that case you may see that SELECT is as much effort as the actual searching for some important searches (and not much improvement for some other searches) and that RTTs dominate the user experience. It depends on the search expression, of course.

If it uses lucene, but has a per-user index instead of a per-mailbox one, then the best-case improvement is even bigger, particularly when the results are in few mailboxes.

Using a DB2-like backend will boost multmailbox searches for more and more varied queries than just a lucene-like index. But the lucene-like index will be really fast for an important subset of queries (such as a simple SEARCH BODY x for all the user's mailboxes).

Mark's point is that for a lot of these sorts of things, you kind of need to know something about the server to know whether doing [X] is a good idea or not.

Yes... and I have a feeling that time spent implementing and analysing aids knowledge.

Perhaps one might say that a server SHOULD only implement multimailbox search if it has the sort of back-end that makes it significantly more efficient. Or perhaps there should also be some sort of clue to the client as to the efficiency.

I think not: The efficiency is difficult to describe concisely, and people will leanr from experience which buttons are best not pressed when the db is big.

Or perhaps clients and servers should do what they like, and the market will sort out which ones match users' expectations best.

Whatever the decision, Mark's point, which is correct, is that you can't make assumptions. A client that wants to let a user search 350 mailboxes will do it, one way or another. But it might or might not perform badly with or without the extension, and you can't know in advance which it'll be.

But you can write the code, run some searches, analyse the logfiles and then you'll know quite a bit.

This is a digression. I'll stop now.

Arnt