Tony Toews wrote:
In terms of total computation necessary, sure, Bayesian is more expensive than just about anything else. However, keep in mind relative costs: bayesian filtering is often happening on the users' desktops, or on a private mailserver where the load doesn't affect any public services. Have you heard of any large corporations or ISPs setting up individual Bayesian filters for each user and rejecting mail at the MX using those filters? I certainly haven't.[snip]More powerful servers? Well sure but what are we doing now to handle spam with word and Bayesian etc filtering along with RBL lookups? In addition the msgid would only be verified by the receiving MTA when the sending MTA is actually sending the email header info. So the msgid is already in RAM thus no cost to do a disk I/O to retrieve information.This basic argument against this is that it is very resource intensive, with a lot of extra costs such as more powerful servers that can handle callbacks, more bandwidth, and extra disk space to store IDs.
More bandwidth? If this kind of thing cuts spam by a significant margin that's a good tradeoff. Also the network traffic is only as many bytes as the msgid plus some overhead along with the ack by the transmitting MTA. Along with retrieving the IP address of the sending MTA from the domain name in the email header.Yes, the bandwidth savings would probably be excellent. Keep in mind that computation is currently cheaper than storage or anything else. So people are willing to expend a lot of computation.
Disk space? Every email currently has, or should have, a unique msgid.
Yes, but a callback DoS would mean an extra level of indirection to track down the attacker. Look at what happened to grc.com a while back, with what its owner, Steve Gibson, called a "Distributed Reflection Denial of Service". A whole lot of machines started forging TCP SYN packets to central internet routers' BGP ports, and they flooded him. Hopefully, there would be some logging on the intermediate reflection MTAs in an attack like this, but it wouldn't mean much.DOS could happen today with someone sending lots of emails to a given receiving MTA.There are also DOS issues: someone forging a domain can cause the forged domain to go under with too many callbacks.
Privacy issues are not likely if this is done with any sort of care at all. We just need to make sure that numbers used can only be related to message parameters by the server that generated the numbers.There are privacy issues as well, and possible replay attacks.I don't see the privacy issues here. Replay attack? Same as a DOS.
Generally, DNS load arguments haven't really held up, except if DNS servers are expected to perform any sort of computations. So the proposal is fine on that front.Ok, this makes sense. But if I understand the proposals the public key will be retained by the DNS servers. Thus increasing the load on those servers. Mind you they would be cached by your "closest" DNS server so that wouldn't be too bad.However, the bottom line here, is that why go through the trouble of checking every single message. Granted that it increases costs, nevertheless if we can significantly reduce the problem through verification of MTAs as opposed to senders and messages, why go through the extra costs of message verification. This proposal is on the table, but we will pursue it only if a significant advantage vs. costs can be proven for this way of doing things, over others requiring less costs.