Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

用 Mail::SpamAssassin 在 INN 端擋 usenet spam...

2 views
Skip to first unread message

Gea-Suan Lin

unread,
Oct 12, 2023, 9:29:50 AM10/12/23
to
最近 comp.lang.c 與 sci.crypt 被 Google Groups (GG) 來的 spam 打的亂
七八糟,所以研究了 Mail::SpamAssassin 的用法來擋 GG spam,重要的片段
是這樣 (放到 `cleanfeed.local` 裡面):

```
# vim: set syntax=perl

use Mail::SpamAssassin;

my $sa_agent = Mail::SpamAssassin->new();

sub local_filter_last {
return unless $hdr{Path} =~ /google-groups\.googlegroups\.com/;

my %myhdr = %hdr;
delete $myhdr{__BODY__};
delete $myhdr{__LINES__};

my $header_str = join "\n", map { "$_: $hdr{$_}" } keys %myhdr;
my $article_str = "$header_str\n\n$hdr{__BODY__}";

my $mail = $sa_agent->parse($article_str);
my $status = $sa_agent->check($mail);

if ($status->is_spam()) {
my $reason = sprintf 'Reject Google Groups posting to %s by SpamAssassin (score %s)', $hdr{Newsgroups}, $status->get_score();
return reject($reason);
}

$status->finish();
$mail->finish();

return;
}

1;
```

但單純這樣效果並不好,需要另外把 spam 與 ham (垃圾與正常的文章) dump
出來讓 sa-learn 訓練。

從 comp.lang.c 取得了 200+ spams + 200+ hams,另外從 comp.lang.python
取得 50+ spams,以及 sci.crypt 取得 200+ spams 後,訓練出來的資料就還
蠻準的,從 server 的 /var/log/news/news.notice 檔案可以看到 Cleanfeed
沒擋下來的幾乎都被 SpamAssassin 擋下來了。

--
Resistance is futile.
https://blog.gslin.org/ & <gs...@gslin.org>
0 new messages