PS: In future we will divide the "heavy lifting" among the systems that
participate using distributed parallel processing techniques & cellular
automata mechanisms rather than having it done at a central location.
The amount of work people are throwing at content-analysis stuns me.
When we start talking about taking the kind of computing resources we
normally throw at, say, finding aliens, or a cure for cancer, I think
people are really starting to go overboard. Spam is being presented
to the user via a mechanism which is essentially NP-complete. This
really isn't a battle that can be won this way, because the content
will continue to change (especially as spammers sell more and more
mainstream products). People keep treating this as though it were a
technical problem that can be "solved" in some sense. It's not.
It's an ongoing battle. As such, you should pick those areas where
the enemy has the least flexibility and attack there. In content
they have *infinite* flexibility.