Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[slrn] experiment: can bayesian filtering score usenet posts?

2 views
Skip to first unread message

Tavis Ormandy

unread,
Dec 19, 2021, 10:32:58 PM12/19/21
to
The problem with training spam filters with NNTP is that the protocol is
designed around offering headers and bodies seperately.

Sure, in theory you could just download everything at once, but then you
lose all the performance benefits of the protocol. If you could just
score on the XOVER headers, then you would still have all the protocol
benefits, but is that enough data?

I decided to try it, and the answer is it works! *but* it took a lot of
training before it started to work.

I used bogofilter (https://bogofilter.sourceforge.io/) and wrote a macro
to pipe just the overview headers into it. It then auto-generates a
scorefile.

For the last few months, it has been really accurate at identifying the
messages I want to read and I've been finding it really useful. If
anyone else wants to try it out, here is the macro I used:

https://lock.cmpxchg8b.com/files/bogofilter.sl

The macro automatically learns any articles you read when you leave a
group. If the message had a positive score, it learns them as good. If
it has a very low score, it learns them as bad.

Tavis.

--
_o) $ lynx lock.cmpxchg8b.com
/\\ _o) _o) $ finger tav...@sdf.org
_\_V _( ) _( ) @taviso

HenHanna

unread,
Feb 21, 2024, 5:56:24 PMFeb 21
to
Tavis.

--
_o) $ lynx lock.cmpxchg8b.com
/\\ _o) _o) $ finger tav...@sdf.org
_\_V _( ) _( ) @taviso



------------ i love your .SIG lines !!!
0 new messages