I have thought about sharing my spam collection with others for use in
developing a better spam filter. We need a large collection of spam to be
able to do various forms of analysis on it. I don't know if such a
collection already exists. If so, I would like to add mine.
My spam filter "idea" is to use keywords, because I use Outlook and Outlook
does not give any other possibilities (as far as I know). The problem is in
choosing the best keywords without using *any* word that occurs in a
non-spam message.
We probably also need a definition of spam. A tentative definition could be
"irrelevant messages" where irrelevant gives a subjective perspective. My
spam might not be your spam :-)
Any further ideas?
Mikkel Rasmussen
> My spam filter "idea" is to use keywords, because I use Outlook and
> Outlook does not give any other possibilities (as far as I know). The
> problem is in choosing the best keywords ...
You might try the rainbow text classifier
(http://www.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html)
to find discover the most informative words for junk e-mail.
There are some papers on using baysian classification to do this kind of
filtering. The paper "A Bayesian Approach to Filtering Junk E-Mail" and
kushmericks adeater system spring to mind. If your interested these can
probably be found on citeseer (http://citeseer.nj.nec.com/cs).
Let me know if this is useful.
AF