Trying to get per-user statistics working, what am I doing wrong?

685 views
Skip to first unread message

Leonardo Boiko

unread,
Jul 14, 2016, 12:38:30 PM7/14/16
to rspamd
My /etc/rspamd/override.d/statistic.conf:

# global classifier with per-language tokens
classifier
"bayes" {
    tokenizer
{
        name
= "osb";
   
}
    cache
{
        path
= "${DBDIR}/learn_cache.global.sqlite";
   
}
    name
= "global";
    min_tokens
= 11;
    min_learns
= 200;
    backend
= "sqlite3";
    languages_enabled
= true;

    statfile
{
        symbol
= "BAYES_HAM";
        path
= "${DBDIR}/bayes.global.ham.sqlite";
        spam
= false;
   
}
    statfile
{
        symbol
= "BAYES_SPAM";
        path
= "${DBDIR}/bayes.global.spam.sqlite";
        spam
= true;
   
}
}

# per-user classifier
classifier
"bayes" {
    tokenizer
{
        name
= "osb";
   
}
    cache
{
        path
= "${DBDIR}/learn_cache.users.sqlite";
   
}
    name
= "users";
    min_tokens
= 11;
    min_learns
= 200;
    backend
= "sqlite3";
   
# languages_enabled = true;
    users_enabled
= true;

    statfile
{
        symbol
= "BAYES_HAM";
        path
= "${DBDIR}/bayes.users.ham.sqlite";
        spam
= false;
   
}
    statfile
{
        symbol
= "BAYES_SPAM";
        path
= "${DBDIR}/bayes.users.spam.sqlite";
        spam
= true;
   
}
}


It does create all six files.  Then:

$ echo 'select count(*) from tokens;' | sqlite3 bayes.users.spam.sqlite
0
$ echo
'select * from users;' | sqlite3 bayes.users.spam.sqlite
0|default|0

$ rspamc
--classifier=users --deliver=example@example.com learn_spam < ~/sample_spam.eml
Results for file: stdin (0.031 seconds)
success
= true;



It does seem to learn something:


$ echo
'select count(*) from tokens;' | sqlite3 bayes.users.spam.sqlite
707



However, the username column seems to be empty?:


$ echo
'select * from users;' | sqlite3 bayes.users.spam.sqlite
0|default|0
1||0

# trying another user
$ rspamc
--classifier=users --deliver=example2@example.com learn_spam < ~/sample_spam.eml
Results for file: stdin (0.007 seconds)
HTTP error
: 404, <AU20160712140011.00003-70961.00016-7159319@168.90.190.39> has been already learned as spam, ignore it



I'm using rspamd 1.2.8-1~jessie on debian jessie.


Leonardo Boiko

unread,
Jul 14, 2016, 12:57:59 PM7/14/16
to rspamd
Update:

$ rspamc --classifier=users -r example@example.com learn_spam < ~/sample_spam.eml
Results for file: stdin (0.035 seconds)
success = true;

$ rspamc --classifier=users -r exam...@example.com learn_spam < ~/
sample_spam.eml
Results for file: stdin (0.035 seconds)
success
= true;


$ echo
'select * from users;' | sqlite3 bayes.users.spam.sqlite
0|default|0
1||0
2|example@example.com|0
3|example2@example.com|0



It seems that the option affecting per-user Bayesian training is '-r' and not '-d' (contrary to the documentation in the manpage and https://rspamd.com/doc/configuration/statistic.html )?
Reply all
Reply to author
Forward
0 new messages