Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bayes expire files

84 views
Skip to first unread message

Julien Gormotte

unread,
Oct 19, 2011, 3:47:45 AM10/19/11
to
Hello everyone,

I have a problem since some time (I don't know exactly when it began) :
spamassassin has lots of bayes_toks.expirexxxx. I had no such files in
august (I have a backup of this time period), so it seems new...

The problem may come from a recent update, as I update my system pretty
often. I did not changed a lot of things on this mail server, and I
don't think I touched amavis config...

It became problematic when I got an alarm because my filesystem was
full : these files took more than 35 GB...

So, anyone knows how to get rid of these ? Is that a new feature ?
What are those files ?

I'm running amavisd-new-2.7.0 on FreeBSD 8.2, with package
p5-Mail-SpamAssassin-3.3.2_2 (and a few others).

Gorn

unread,
Oct 19, 2011, 4:56:38 AM10/19/11
to

Julien Gormotte

unread,
Oct 19, 2011, 6:51:24 AM10/19/11
to
Le Wed, 19 Oct 2011 10:56:38 +0200,
"Gorn" <Go...@xs4all.nl> a écrit :

>
> http://old.nabble.com/bayes_toks.expire-problem-td22502372.html
>
>

I already tried some of the ways indicated here, and nothing very
good..
This is pretty strange :

root@rei ~ % zfs list | grep amavis
tank/mails/amavisdb 90,0M 1,91G
90,0M /usr/jails/mail/var/amavis

root@rei ~ % du
-sh /usr/jails/mail/var/amavis 90M /usr/jails/mail/var/amavis

root@rei ~ % ls -lah /usr/jails/mail/var/amavis/.spamassassin/
total 92138
drwx------ 2 110 110 9B 19 oct 12:35 .
drwxr-xr-x 6 110 110 9B 19 oct 12:44 ..
-rw------- 1 110 110 25K 19 oct 12:46 bayes_journal
-rw------- 1 110 110 4,9M 19 oct 12:35 bayes_seen
-rw------- 1 110 110 39M 19 oct 12:35 bayes_toks
-rw------- 1 110 110 65M 18 oct 11:31 bayes_toks.expire10463
-rw------- 1 110 110 128T 19 oct 12:16 bayes_toks.expire21624
-rw------- 1 110 110 8,0T 19 oct 11:35 bayes_toks.expire64012
-rw-r----- 1 110 110 109B 18 oct 11:31 razor-agent.log

On a 80 GB disk, this is a very good compression :)

On debug mode, I got a lot of :

Oct 19 12:16:16.061 [21624] dbg: locker: refresh_lock:
refresh /var/amavis/.spamassassin/bayes.lock


and after some time :
HASH: Out of overflow pages. Increase page size
Segmentation fault (core dumped)

Julien

Mark Martinec

unread,
Oct 20, 2011, 3:00:51 PM10/20/11
to
Julien,

> "Gorn" <Go...@xs4all.nl> a écrit :
> > http://old.nabble.com/bayes_toks.expire-problem-td22502372.html
>
> I already tried some of the ways indicated here, and nothing very
> good..

You did try disabling auto-expire and running it manually,
as indicated in that thread?

> root@rei ~ % ls -lah /usr/jails/mail/var/amavis/.spamassassin/
> total 92138
> drwx------ 2 110 110 9B 19 oct 12:35 .
> drwxr-xr-x 6 110 110 9B 19 oct 12:44 ..
> -rw------- 1 110 110 25K 19 oct 12:46 bayes_journal
> -rw------- 1 110 110 4,9M 19 oct 12:35 bayes_seen
> -rw------- 1 110 110 39M 19 oct 12:35 bayes_toks
> -rw------- 1 110 110 65M 18 oct 11:31 bayes_toks.expire10463
> -rw------- 1 110 110 128T 19 oct 12:16 bayes_toks.expire21624
> -rw------- 1 110 110 8,0T 19 oct 11:35 bayes_toks.expire64012
> -rw-r----- 1 110 110 109B 18 oct 11:31 razor-agent.log
>
> On a 80 GB disk, this is a very good compression :)

:-)

If a temporary tokens database gets so much larger than
the original database is, my guess is that the current database
is corrupted.

> On debug mode, I got a lot of :
> Oct 19 12:16:16.061 [21624] dbg: locker: refresh_lock:
> refresh /var/amavis/.spamassassin/bayes.lock
>
> and after some time :
> HASH: Out of overflow pages. Increase page size
> Segmentation fault (core dumped)

For bayes databases of any substantial size choosing an SQL-based
bayes usually offers a faster and more reliable operation. Instructions are
in the sql directory of the SpamAssassin distribution (files README.bayes
and bayes_mysql.sql or bayes_pg.sql). Choose either an MySQL with InnoDB
and Mail::SpamAssassin::BayesStore::MySQL as bayes_store_module,
or a fairly recent version of PostgreSQL. With a bayes on SQL it is usually
just fine to leave auto-expiry enabled.

As long as the rest of your SA rules and network tests are good,
it is not a big deal to start a new bayes database from scratch and
leaving it to auto-learning. For the first couple of hours it may be
prudent to lower the scores of BAYES_00 and BAYES_99 rules.

Btw, if starting from scratch, it is also a good idea to set:
bayes_auto_learn_on_error 1
(introduced with SpamAssassin 3.3).
See Mail::SpamAssassin::Plugin::AutoLearnThreshold man page
for a description of this setting.

Mark

Julien Gormotte

unread,
Oct 21, 2011, 3:02:06 AM10/21/11
to
Le Thu, 20 Oct 2011 21:00:51 +0200,
Mark Martinec <Mark.Marti...@ijs.si> a écrit :

> Julien,
>
> > "Gorn" <Go...@xs4all.nl> a écrit :
> > > http://old.nabble.com/bayes_toks.expire-problem-td22502372.html
> >
> > I already tried some of the ways indicated here, and nothing very
> > good..
>
> You did try disabling auto-expire and running it manually,
> as indicated in that thread?

Yes, I set :
bayes_expiry_max_db_size 300000
bayes_auto_expire 0

And run :
sa-learn --force-expire

I runned for quite some time, and I got these huge files. Before the
files were using "just" 34 GB.

>
> > root@rei ~ % ls -lah /usr/jails/mail/var/amavis/.spamassassin/
> > total 92138
> > drwx------ 2 110 110 9B 19 oct 12:35 .
> > drwxr-xr-x 6 110 110 9B 19 oct 12:44 ..
> > -rw------- 1 110 110 25K 19 oct 12:46 bayes_journal
> > -rw------- 1 110 110 4,9M 19 oct 12:35 bayes_seen
> > -rw------- 1 110 110 39M 19 oct 12:35 bayes_toks
> > -rw------- 1 110 110 65M 18 oct 11:31 bayes_toks.expire10463
> > -rw------- 1 110 110 128T 19 oct 12:16 bayes_toks.expire21624
> > -rw------- 1 110 110 8,0T 19 oct 11:35 bayes_toks.expire64012
> > -rw-r----- 1 110 110 109B 18 oct 11:31 razor-agent.log
> >
> > On a 80 GB disk, this is a very good compression :)
>
> :-)
>
> If a temporary tokens database gets so much larger than
> the original database is, my guess is that the current database
> is corrupted.

I tried to run :
sa-learn --clear

and then :
sa-learn --force-expire

It did not remove expire files, so I deleted them manually. I'll see
what happens.

>
> > On debug mode, I got a lot of :
> > Oct 19 12:16:16.061 [21624] dbg: locker: refresh_lock:
> > refresh /var/amavis/.spamassassin/bayes.lock
> >
> > and after some time :
> > HASH: Out of overflow pages. Increase page size
> > Segmentation fault (core dumped)
>
> For bayes databases of any substantial size choosing an SQL-based
> bayes usually offers a faster and more reliable operation.
> Instructions are in the sql directory of the SpamAssassin
> distribution (files README.bayes and bayes_mysql.sql or
> bayes_pg.sql). Choose either an MySQL with InnoDB and
> Mail::SpamAssassin::BayesStore::MySQL as bayes_store_module, or a
> fairly recent version of PostgreSQL. With a bayes on SQL it is
> usually just fine to leave auto-expiry enabled.

I'll see what happens after my last operations, and it may be a good
idea to try sql backend afterwards.

>
> As long as the rest of your SA rules and network tests are good,
> it is not a big deal to start a new bayes database from scratch and
> leaving it to auto-learning. For the first couple of hours it may be
> prudent to lower the scores of BAYES_00 and BAYES_99 rules.
>
> Btw, if starting from scratch, it is also a good idea to set:
> bayes_auto_learn_on_error 1
> (introduced with SpamAssassin 3.3).
> See Mail::SpamAssassin::Plugin::AutoLearnThreshold man page
> for a description of this setting.
>
> Mark

I'll take some time to see this as soon as I can, thanks for the
advices :)

0 new messages