[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Asrg] 2.0 Metrics (Was Re: [Asrg] spamstones)



  Arrgh!  Following up to correct a couple critical typos below which I
didn't catch when proofreading before sending it.  Sorry, I plead
Murphy's law.

On Wed, Apr 02, 2003 at 10:38:23AM -1000, Clifton Royston wrote:
>   I define FP and FN with the provision that they are not allowed to
> be = 0, but otherwise in the standard way:  
> 
>   measure     spam category       ham category 
>   -------     ---------------     --------------
>   flagged     N(flagged-spam)     N(flagged-ham)
>   unflagged   N(unflagged-spam)   N(flagged-ham)
>   -------     ---------------     --------------
>   total       N(spam)             N(ham)
> 
>   Then FP = max(N(flagged-ham),0.5 ) / N(ham)
>        FN = max(N(unflagged-spam),0.5) / N(spam)
> 
>   Using the minimum of 1.5 for the numerator avoids undefined values in

                         0.5
 
> the log computation, and also deliberately penalizes results claiming
> "zero false positives" or "zero false negatives" if they use a small
> sample size.
> 
>   The tentative definition for "dSpam" is:
>  10 * ( -log10(FP) - log10(FP) + log(1/4) ) 

   10 * ( -log10(FP) - log10(FN) + log(1/4) ) 

   i.e. includes the log of both false positive rate and false negative
rate, not FP twice.

  -- Clifton

-- 
     Clifton Royston  --  LavaNet Systems Architect --  cliftonr@lava.net

  "If you ride fast enough, the Specialist can't catch you."
  "What's the Specialist?" Samantha says. 
  "The Specialist wears a hat," says the babysitter. "The hat makes noises."
  She doesn't say anything else.  
                      Kelly Link, _The Specialist's Hat_
_______________________________________________
Asrg mailing list
Asrg@ietf.org
https://www1.ietf.org/mailman/listinfo/asrg