[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: False Positive -was- RE: [Asrg] Fwd: Returned mail: see transcript for details



I slightly prefer the terms accept/reject over negative/positive.

PROPORTION DEFINITION:

I think that a 2x2 matrix is most straightforward presentation:

	{accept,reject} x {ham,spam}

	accept	reject	total
ham	TA	FR	NHAM = TA+FR
spam	FA	TR	NSPAM = FA+TR
total	TA+FA	FR+TR	NTOTAL

and the most helpful intuitive proportions (I think) would be
FRp = FR/NHAM and FAp = FA/NSPAM.

Numbers will ALWAYS be dependent on a particular corpus.

However, I think that the definitions above will be stable
(measuring classifier quality, not corpus composition) over
a wide range of ham-spam ratios.  Using FR/NTOTAL or FA/NTOTAL
will be dominated by ham-spam ratio of the test set, obscuring
the performance of the classifier, making results unnecessarily
corpus-specific.


OTHER MEASURES:

Precision/recall curves are another candidate.


_______________________________________________
Asrg mailing list
Asrg@ietf.org
https://www1.ietf.org/mailman/listinfo/asrg