Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Getting spamassassin and clamav as inn filters

25 views
Skip to first unread message

The Doctor

unread,
Oct 9, 2023, 10:57:17 AM10/9/23
to
Any recipes how?
--
Member - Liberal International This is doc...@nk.ca Ici doc...@nk.ca
Yahweh, King & country!Never Satan President Republic!Beware AntiChrist rising!
Look at Psalms 14 and 53 on Atheism https://www.empire.kred/ROOTNK?t=94a1f39b
An oil stain on the carpet is not removed by picking up the litter. -unknown Beware https://mindspring.com

Gea-Suan Lin

unread,
Oct 12, 2023, 12:26:18 AM10/12/23
to
On 2023-10-09, The Doctor <doc...@doctor.nl2k.ab.ca> wrote:
> Any recipes how?

Yeah, I just implemented a simple hack within `cleanfeed.local`. Have
tried, but not so useful. Still many spam into comp.lang.c and other
groups.

The most efficient way to avoid Google Groups spam for now is just
giving up anything from Google Groups.

```
use Mail::SpamAssassin;

my $sa_agent = Mail::SpamAssassin->new();

sub local_filter_last {
return unless $hdr{Path} =~ /google-groups\.googlegroups\.com/;

my %myhdr = %hdr;
delete $myhdr{__BODY__};
delete $myhdr{__LINES__};

my $header_str = join "\n", map { "$_: $hdr{$_}" } keys %myhdr;
my $article_str = "$header_str\n\n$hdr{__BODY__}";

my $mail = $sa_agent->parse($article_str);
my $status = $sa_agent->check($mail);

return reject("Reject Google Groups posting to $hdr{Newsgroups} by SpamAssassin") if $status->is_spam();

$status->finish();
$mail->finish();

return;
}
```

Ray Banana

unread,
Oct 12, 2023, 1:44:32 AM10/12/23
to
* Gea-Suan Lin wrote:
> On 2023-10-09, The Doctor <doc...@doctor.nl2k.ab.ca> wrote:
>> Any recipes how?
>
> Yeah, I just implemented a simple hack within `cleanfeed.local`. Have
> tried, but not so useful. Still many spam into comp.lang.c and other
> groups.
[...]
> use Mail::SpamAssassin;
>
> my $sa_agent = Mail::SpamAssassin->new();
>
> sub local_filter_last {
> return unless $hdr{Path} =~ /google-groups\.googlegroups\.com/;
>
> my %myhdr = %hdr;
> delete $myhdr{__BODY__};
> delete $myhdr{__LINES__};
>
> my $header_str = join "\n", map { "$_: $hdr{$_}" } keys %myhdr;
> my $article_str = "$header_str\n\n$hdr{__BODY__}";
>
> my $mail = $sa_agent->parse($article_str);
> my $status = $sa_agent->check($mail);
>
> return reject("Reject Google Groups posting to $hdr{Newsgroups} by SpamAssassin") if $status->is_spam();
>
> $status->finish();
> $mail->finish();
>
> return;
> }
> ```

OK, now you need a ~/.spamassassin directory for your news user and a user_prefs
file in that directory. After that you can start adding rules for Usenet spam.
You will also need to feed several hundreds of spam and ham articles to sa-learn --spam
or sa-learn --ham as the news user. After that, SpamAssassin will gradually improve.

--
Пу́тін — хуйло́
http://www.eternal-september.org

Julien ÉLIE

unread,
Oct 12, 2023, 2:58:21 AM10/12/23
to
Hi Gea-Suan Lin,

>> Any recipes how?
>
> Yeah, I just implemented a simple hack within `cleanfeed.local`. Have
> tried, but not so useful. Still many spam into comp.lang.c and other
> groups.

FWIW, there's a doc in French to set up a "spamchk" funnel to
SpamAssassin in the newsfeeds file:

https://web.archive.org/web/20230901182332/https://git.alphanet.ch/gitweb/?p=inn-install;a=blob_plain;f=README.html;hb=HEAD#filtrer-le-spam-avec-spamassassin

--
Julien ÉLIE

« Medicus curat, natura sanat. »

Gea-Suan Lin

unread,
Oct 12, 2023, 5:00:11 AM10/12/23
to
Thanks for the information.

I added a setting into ~/.spamassassin/user_prefs for recognizing MIME
part:

#
bayes_token_sources all

Then I manually selected 200+ hams and 200+ spams from comp.lang.c, and
50+ spams from comp.lang.python as well as 200+ spams from sci.crypt.
Afterwards I sent all these hams/spams into sa-learn.

The result looks pretty good so far. Almost all new spams into
comp.lang.c were blocked by SpamAssassin.

I put my trained files here, so you may just reuse it:

https://newsfeed.hasname.com/files/usenet-spamassassin-20231012.tar.gz
Resistance is futile.
https://blog.gslin.org/ & <gs...@gslin.org>

yamo'

unread,
Jan 28, 2024, 4:05:51 AMJan 28
to
Hi Julien,


Julien ÉLIE a tapoté le 12/10/2023 08:58:
> Hi Gea-Suan Lin,
>
>>> Any recipes how?
>>
>> Yeah, I just implemented a simple hack within `cleanfeed.local`. Have
>> tried, but not so useful. Still many spam into comp.lang.c and other
>> groups.
>
> FWIW, there's a doc in French to set up a "spamchk" funnel to
> SpamAssassin in the newsfeeds file:
>
> https://web.archive.org/web/20230901182332/https://git.alphanet.ch/gitweb/?p=inn-install;a=blob_plain;f=README.html;hb=HEAD#filtrer-le-spam-avec-spamassassin
>

The spamchk funnel is slower than calling SpamAssassin in cleanfeed.local.
After some tests, I've adopted the technique from Gea-Suan Lin, it could
be found here :
<http://al.howardknight.net/?STYPE=msgid&MSGI=%3Cug7sh8%24pcc%241%40colo-sc-1.gslin.com%3E>


I will update the French documentation :
<https://git.mcos.nc/INN/inn_install>

--
Stéphane
UTILISATEURS de GOOGLE GROUPS, vous n'aurez bientôt plus accès à Usenet.
<https://support.google.com/groups/answer/11036538>
Des serveurs gratuits de remplacement : <http://usenet-fr.yakakwatik.org>
Des logiciels : <http://usenet-fr.yakakwatik.org/lecteurs-de-news.html>

Ray Banana

unread,
Jan 28, 2024, 5:22:54 AMJan 28
to
Thus spake yamo' <ya...@beurdin.invalid>

[...]
> After some tests, I've adopted the technique from Gea-Suan Lin, it could
> be found here :
> <http://al.howardknight.net/?STYPE=msgid&MSGI=%3Cug7sh8%24pcc%241%40colo-sc-1.gslin.com%3E>

For performance reasons, especially if you receive a full text feed, I
would recommend to use spamd instead of starting spamassassin for every
article:

my %myhdr = %hdr;
delete $myhdr{__BODY__};
delete $myhdr{__LINES__};
my $header_str = join "\n", map { "$_: $hdr{$_}" } keys %myhdr;
my $article_str = "$header_str\n\n$hdr{__BODY__}";
my $spamtest = Mail::SpamAssassin::Client->new({
port => /spamd port/,
host => /spamd host/,
username => 'news'}); # Use ~news/.spamassassin/user_prefs

my $result = $spamtest->process($article_str);
$score = $result->{score};

INN::syslog('notice', $hdr{'Message-ID'} . " Score: $score, isspam: " . $result->{isspam} );
if ($result->{isspam} =~ 'True') {
[...] # local proceessing, nocemize etc.
return 'SPAM';

} else {
[...] # local processing
}



--
Пу́тін — хуйло́
https://www.eternal-september.org

yamo'

unread,
Jan 28, 2024, 1:58:23 PMJan 28
to
Hi Ray,

Ray Banana a tapoté le 28/01/2024 11:22:
> Thus spake yamo' <ya...@beurdin.invalid>
>
> [...]
>> After some tests, I've adopted the technique from Gea-Suan Lin, it could
>> be found here :
>> <http://al.howardknight.net/?STYPE=msgid&MSGI=%3Cug7sh8%24pcc%241%40colo-sc-1.gslin.com%3E>
>
> For performance reasons, especially if you receive a full text feed, I
> would recommend to use spamd instead of starting spamassassin for every
> article:


Thanks!

It works but I have to test a little more.
0 new messages