Spammassassin & sa-learn

0 views
Skip to first unread message

Dave S

unread,
Apr 1, 2003, 6:54:43 AM4/1/03
to
Is anyone using the bayesian learning (sa-learn) with spamassassin?

my mailboxes have all the emails in one file. Can I forward those files
to sa-learn or do I need to send the emails separately? Or can I set
CGP to save my emails separately if I need to?

Not done this before and thought I'd check with the list before I screw
something up!


Thanks

Dave S


#############################################################
This message is sent to you because you are subscribed to
the mailing list <CGat...@mail.stalker.com>.
To unsubscribe, E-mail to: <CGateP...@mail.stalker.com>
To switch to the DIGEST mode, E-mail to <CGatePr...@mail.stalker.com>
To switch to the INDEX mode, E-mail to <CGatePr...@mail.stalker.com>
Send administrative queries to <CGatePro...@mail.stalker.com>

Daniel M. Zimmerman

unread,
Apr 1, 2003, 6:59:45 AM4/1/03
to

--On tisdag 1 april 2003 12.54 +0100 Dave S <dave_s...@pickering.co.uk>
wrote:

> Is anyone using the bayesian learning (sa-learn) with spamassassin?

I am.

> my mailboxes have all the emails in one file. Can I forward those files
> to sa-learn or do I need to send the emails separately? Or can I set CGP
> to save my emails separately if I need to?

You can point sa-learn to an .mbox file, just fine. Or, you can set CGP to
make Maildir mailboxes and then point sa-learn to the individual files.

At some point, I'll be putting some more automated (or at least friendlier,
not requiring a command line and root access :) sa-learn functionality in
CGPSA; unfortunately, that point is likely to be a decent ways off... In
the meantime, I've been using the Bayes stuff from the command line. I
built up a several-thousand message spam corpus over a year or so, which -
combined with my incredible amount of archived real email - was a good
basis for training. :)

-Dan

------------------------------------------------------------------
Daniel M. Zimmerman TFF Enterprises
M/S 256-80 - Caltech http://www.tffenterprises.com/
Pasadena, California 91125 USA d...@tffenterprises.com

Dave S

unread,
Apr 1, 2003, 8:49:23 AM4/1/03
to

On Tuesday, April 1, 2003, at 12:59 pm, Daniel M. Zimmerman wrote:
> At some point, I'll be putting some more automated (or at least
> friendlier, not requiring a command line and root access :) sa-learn
> functionality in CGPSA; unfortunately, that point is likely to be a
> decent ways off... In the meantime, I've been using the Bayes stuff
> from the command line. I built up a several-thousand message spam
> corpus over a year or so, which - combined with my incredible amount
> of archived real email - was a good basis for training. :)
>


Cheers!


I've not tried your CGPSA yet but probably will the next time I change
anything.

Has it got functionality for MySQL userprefs too? That would be
really nice :-)

Dave S

Stefan Seiz

unread,
Apr 1, 2003, 9:50:16 AM4/1/03
to
On 01.04.2003 13:59 Uhr, Daniel M. Zimmerman <dmz-...@tffenterprises.com>
wrote:

>> my mailboxes have all the emails in one file. Can I forward those files
>> to sa-learn or do I need to send the emails separately? Or can I set CGP
>> to save my emails separately if I need to?
>
> You can point sa-learn to an .mbox file, just fine. Or, you can set CGP to
> make Maildir mailboxes and then point sa-learn to the individual files.
>

> At some point, I'll be putting some more automated (or at least friendlier,
> not requiring a command line and root access :) sa-learn functionality in
> CGPSA; unfortunately, that point is likely to be a decent ways off... In
> the meantime, I've been using the Bayes stuff from the command line. I
> built up a several-thousand message spam corpus over a year or so, which -
> combined with my incredible amount of archived real email - was a good
> basis for training. :)

Training SA Bayes is a simple as the following on the command line:
sa-learn-spam /var/CommuniGate/Accounts/YourAccnt.macnt/spam.mbox
to learn some spam from an mbox.
sa-learn-nonspam /var/CommuniGate/Accounts/YourAccnt.macnt/INBOX.mdir/*
to learn some ham (non spam) from a maildir


Once learned, SA >= 2.5 will just start considering bayes automagically when
scanning...

--
Stefan Seiz <http://www.StefanSeiz.com>
Spamto: <b...@imd.net>

Mike Yrabedra

unread,
Apr 1, 2003, 10:31:00 AM4/1/03
to
on 4/1/03 9:50 AM, Stefan Seiz at Talk...@index-s.de wrote:

> sa-learn-spam /var/CommuniGate/Accounts/YourAccnt.macnt/spam.mbox


I don't see the spam.mbox

Is this a special mailbox you set up to have spam routed to?

And if so, do you just manually empty it every once in awhile?

+--------------------------------------------+
Mike Yrabedra (President)
323 Incorporated
Home of MacDock, MacAgent and MacSurfshop
+--------------------------------------------+
W: http://www.323inc.com/
P: 770.382.1195
F: 734.448.5164
E: mi...@323inc.com
I: ichatmacdock
+--------------------------------------------+
"Whatever you do, work at it with all your heart,
as working for the Lord, not for men."
~Colossians 3:23 <{{{><
+--------------------------------------------+

Stefan Seiz

unread,
Apr 1, 2003, 10:35:59 AM4/1/03
to
On 01.04.2003 17:31 Uhr, Mike Yrabedra <mi...@323inc.com> wrote:

> I don't see the spam.mbox
>
> Is this a special mailbox you set up to have spam routed to?

yes, simply an imap folder i store spam in.

> And if so, do you just manually empty it every once in awhile?

no, i keep it archived there. sa-learnspam remembers the messages it already
learned.

--
Stefan Seiz <http://www.StefanSeiz.com>
Spamto: <b...@imd.net>

Mike Yrabedra

unread,
Apr 1, 2003, 10:39:38 AM4/1/03
to
on 4/1/03 10:35 AM, Stefan Seiz at Talk...@index-s.de wrote:

>> And if so, do you just manually empty it every once in awhile?
>
> no, i keep it archived there. sa-learnspam remembers the messages it already
> learned.


Doesn't that folder get quite LARGE ;-)

So once you run the sa-learn command once, that's it? From then on it will
learn your spam and ham?

+--------------------------------------------+
Mike Yrabedra (President)
323 Incorporated
Home of MacDock, MacAgent and MacSurfshop
+--------------------------------------------+
W: http://www.323inc.com/
P: 770.382.1195
F: 734.448.5164
E: mi...@323inc.com
I: ichatmacdock
+--------------------------------------------+
"Whatever you do, work at it with all your heart,
as working for the Lord, not for men."
~Colossians 3:23 <{{{><
+--------------------------------------------+

#############################################################

Stefan Seiz

unread,
Apr 1, 2003, 11:36:58 AM4/1/03
to
On 1.4.2003 17:39 Uhr, Mike Yrabedra <mi...@323inc.com> wrote:

> Doesn't that folder get quite LARGE ;-)

Who cares?

> So once you run the sa-learn command once, that's it? From then on it will
> learn your spam and ham?

No, as soo as you have new spam and you think it is worth, you do another
sa-learn command...

How about a little "man Mail::SpamAssassin"? ;-)

--
Stefan Seiz <http://www.stefanseiz.com>
Spamto: <b...@imd.net>

Mike Yrabedra

unread,
Apr 1, 2003, 11:40:53 AM4/1/03
to
on 4/1/03 11:36 AM, Stefan Seiz at Talk...@index-s.de wrote:

> How about a little "man Mail::SpamAssassin"? ;-)


Sorry to have bothered you.

+--------------------------------------------+
Mike Yrabedra (President)
323 Incorporated
Home of MacDock, MacAgent and MacSurfshop
+--------------------------------------------+
W: http://www.323inc.com/
P: 770.382.1195
F: 734.448.5164
E: mi...@323inc.com
I: ichatmacdock
+--------------------------------------------+
"Whatever you do, work at it with all your heart,
as working for the Lord, not for men."
~Colossians 3:23 <{{{><
+--------------------------------------------+

#############################################################

Stefan Seiz

unread,
Apr 1, 2003, 11:55:42 AM4/1/03
to
On 1.4.2003 18:40 Uhr, Mike Yrabedra <mi...@323inc.com> wrote:

>> How about a little "man Mail::SpamAssassin"? ;-)
>
> Sorry to have bothered you.

No problem. No offense intended - really. I just thought it wouldn't hurt.

In my *personal* experience, reading some docs helped me much more to
understand how things are done, than simply having someone tell me how to do
it... I know time is precious, but reading a man page often takes less time
than writing an email and chewing on the responses.

--
<http://www.StefanSeiz.com>
Spamto: <b...@imd.net>

Daniel M. Zimmerman

unread,
Apr 1, 2003, 2:11:11 PM4/1/03
to
--On tisdag 1 april 2003 14.49 +0100 Dave S <dave_s...@pickering.co.uk>
wrote:

> I've not tried your CGPSA yet but probably will the next time I change


> anything.
>
> Has it got functionality for MySQL userprefs too? That would be really
> nice :-)

No, it doesn't; I considered doing it that way at first, but I decided
against it for a (I think) pretty logical reason. With the way CGPSA does
userprefs now, the user preferences (and state) live _in_ the user's
CommuniGate account directory. This means that if the user is renamed, the
preferences stay with them; if the user is deleted, the preferences are
deleted. To use MySQL user prefs would require some sort of synchronization
process to keep the prefs database up to date, remove unused entries,
etcetera, which is more trouble than it's worth when you could just use
flat files in known "home" directories. Besides which, the state files
(bayes database, auto-whitelist) can't be stored in MySQL yet, so you'd
still have to use the user home directories for something...

-Dan

------------------------------------------------------------------
Daniel M. Zimmerman TFF Enterprises
M/S 256-80 - Caltech http://www.tffenterprises.com/
Pasadena, California 91125 USA d...@tffenterprises.com

#############################################################

Reply all
Reply to author
Forward
0 new messages