[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: False Positive -was- RE: [Asrg] Fwd: Returned mail: seetranscript for details



I love statistics, but is it possible that not-spam could be possibly called "not spam" rather than "ham" in the research-and-report context? The word "spam" creates enough difficulty on its own without adding another "zany techie word."



At 10:45 -1000 4/2/03, Clifton Royston wrote:
On Tue, Apr 01, 2003 at 02:18:34PM -0500, Liudvikas Bukys wrote:
 I slightly prefer the terms accept/reject over negative/positive.

 PROPORTION DEFINITION:

 I think that a 2x2 matrix is most straightforward presentation:

	{accept,reject} x {ham,spam}

	accept	reject	total
 ham	TA	FR	NHAM = TA+FR
 spam	FA	TR	NSPAM = FA+TR
 total	TA+FA	FR+TR	NTOTAL

 and the most helpful intuitive proportions (I think) would be
 FRp = FR/NHAM and FAp = FA/NSPAM.

 Numbers will ALWAYS be dependent on a particular corpus.

 However, I think that the definitions above will be stable
 (measuring classifier quality, not corpus composition) over
 a wide range of ham-spam ratios.  Using FR/NTOTAL or FA/NTOTAL
 will be dominated by ham-spam ratio of the test set, obscuring
 the performance of the classifier, making results unnecessarily
 corpus-specific.
This is exactly correct, IMHO.

Your matrix is a flipped version of the matrix I just presented in
proposing the FP/FN calculations to be used for the "dSpam" measure.

-- Clifton

--
Clifton Royston -- LavaNet Systems Architect -- cliftonr@lava.net

"If you ride fast enough, the Specialist can't catch you."
"What's the Specialist?" Samantha says.
"The Specialist wears a hat," says the babysitter. "The hat makes noises."
She doesn't say anything else. Kelly Link, _The Specialist's Hat_
_______________________________________________
Asrg mailing list
Asrg@ietf.org
https://www1.ietf.org/mailman/listinfo/asrg
_______________________________________________
Asrg mailing list
Asrg@ietf.org
https://www1.ietf.org/mailman/listinfo/asrg