[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Asrg] Collecting statistics



Vernon Schryver wrote:

> > I think a mechanism similar to DCC could be good.  We need a way for
> > lots of sensors to dump information into the collection network.  We
> > need a way to decide what summaries of the data are useful.  And we need
> > a way to extract the data without compromising privacy.  (e.g., report
> > SHA1 hashes of addresses rather than addresses themselves.)
> > ...

> I think I'm qualified to comment on that idea.  The short version of
> my take is "great in theory but nearly hopeless in practice."

OK.

> The hopeless part is that in practice it is extremely difficult to
> build a network large enough to collect enough data to be other than
> a muddy pile of annecdotes.

Well, this is a research group; we consider blue-sky ideas.  If most
people start seeing the level of spam that striker.ottawa.on.ca does,
there will be huge incentive to collaborate.

> The DCC is more than 2 years old, but it
> still sees at most single-digit percentages of all mail in the network
> and perhaps less.

I'm not sure that you need a much higher percentage than that.  If
your sensors are well-dispersed, a couple of percent should be pretty
representative, or at least enough to start detecting trends.

> Even in the skewed population that does use the DCC, there is evidence
> that making generalizations is very hard.  There is 3X difference in
> spam load per user depending on organization type judging from
> http://www.dcc-servers.net/dcc/graphs/comp-rates

Yes, I see that also among my customers.  But we're looking for trends
across the whole Internet.

[...]
> Consider the dangers of being able to
> ask whether the system has seen a message with a sender of the hash of
> "Bill Gates" and at recipeint of the hash of "Steve Jobs" today.

You probably only want hashes of IP addresses, not e-mail addresses.
If the DCC collected not only message hashes, but also the number of
different IPs from which those messages originated, I bet we'd see
some interesting data.

--
David.
_______________________________________________
Asrg mailing list
Asrg@ietf.org
https://www1.ietf.org/mailman/listinfo/asrg