autolearn spam and ham for redis

Anton Onischenko

unread,

Sep 7, 2018, 10:53:54 AM9/7/18

to rspamd

Hello Everyone!

I installed the new version of rspamd 1.8.0

But I could not make it autolearn.

/usr/local/etc/rspamd# rspamc stat

Results for command: stat (0.054 seconds)

Messages scanned: 51

Messages learned: 0

Fuzzy hashes in storage "rspamd.com": 150303023

Fuzzy hashes stored: 150303023

Statfile: BAYES_SPAM type: redis; length: 0; free blocks: 0; total blocks: 0; free: 0.00%; learned: 1884; users: 1; languages: 0

Statfile: BAYES_HAM type: redis; length: 0; free blocks: 0; total blocks: 0; free: 0.00%; learned: 1582; users: 1; languages: 0

Total learns: 3466

----

I tryed to check my email :

/usr/local/etc/rspamd# rspamc -p -t 7 -F "w...@mail.ru" -r "ww...@mail.ru" -i "1.1.1.1" --hostname "mail.ru" --helo "mail.ru" < /var/qmail/new/05363273382341361400

Results for file: stdin (0.540 seconds)

[Metric: default]

Action: reject

Spam: true

Score: 13.12 / 8.00

Symbol: ARC_NA (0.00)

Symbol: ASN (0.00)[asn:13335, ipnet:1.1.1.0/24, country:US]

Symbol: DKIM_TRACE (0.00)[]

Symbol: DMARC_POLICY_REJECT (2.00)[: No valid SPF, No valid DKIM, reject]

Symbol: ENVFROM_SERVICE_ACCT (1.00)

Symbol: FORGED_RECIPIENTS (2.00)[ ww...@mail.ru]

Symbol: FORGED_SENDER (0.30)[ w...@mail.ru]

Symbol: FREEMAIL_ENVFROM (0.00)[mail.ru]

Symbol: FREEMAIL_ENVRCPT (0.00)[mail.ru]

Symbol: FROM_HAS_DN (0.00)

Symbol: FROM_NEQ_ENVFROM (0.00)[w...@mail.ru]

Symbol: HAS_LIST_UNSUB (-0.01)

Symbol: HTML_SHORT_LINK_IMG_3 (0.50)

Symbol: MANY_INVISIBLE_PARTS (0.40)[5]

Symbol: MID_RHS_NOT_FQDN (0.50)

Symbol: MIME_BASE64_TEXT (0.10)

Symbol: MIME_GOOD (-0.10)[multipart/alternative, text/plain]

Symbol: MY_FREE_MAIL (0.50)[w...@mail.ru]

Symbol: PHISHING (3.93)[verisign->verisigninc]

Symbol: RCPT_COUNT_ONE (0.00)[1]

Symbol: RCVD_COUNT_THREE (0.00)[3]

Symbol: RCVD_NO_TLS_LAST (0.00)

Symbol: REPLYTO_ADDR_EQ_FROM (0.00)

Symbol: R_DKIM_REJECT (1.00)[mail.ru]

Symbol: R_SPF_SOFTFAIL (0.00)[~all]

Symbol: TO_DN_NONE (0.00)

Symbol: URI_COUNT_ODD (1.00)[23]

Message-ID: 198792dc6e924ed7b965e6a98e4d7119@425

Message - spf: (SPF): spf softfail

----

After that, I run again static and see the same count of learned messages.

/usr/local/etc/rspamd# rspamc stat

Results for command: stat (0.048 seconds)

Messages scanned: 52

Messages learned: 0

Connections count: 0

Control connections count: 0

Pools allocated: 28

Pools freed: 0

Bytes allocated: 24.87M

Memory chunks allocated: 204

Shared chunks allocated: 16

Chunks freed: 0

Oversized chunks: 1

Fuzzy hashes in storage "rspamd.com": 150311085

Fuzzy hashes stored: 150311085

Statfile: BAYES_SPAM type: redis; length: 0; free blocks: 0; total blocks: 0; free: 0.00%; learned: 1884; users: 1; languages: 0

Statfile: BAYES_HAM type: redis; length: 0; free blocks: 0; total blocks: 0; free: 0.00%; learned: 1582; users: 1; languages: 0

Total learns: 3466

---

But if I try to force learn_spam

/usr/local/etc/rspamd# rspamc -p learn_spam< /var/new/05363273382341361400 ,

сount of "message learned" is increased in rspamc stat.

What is wrong with my configs?

/usr/local/etc/rspamd# cat statistic.conf

classifier "bayes" {

tokenizer {

name = "osb";

}

name = "common";

cache {

#path = "${DBDIR}/learn_cache.sqlite";

type="redis"

}

backend = "redis";

servers = "127.0.0.1:6379";

languages_enabled = true;

min_tokens = 11;

min_learns = 200;

autolearn = [-4, 7]

statfile {

symbol = "BAYES_HAM";

#path = "${DBDIR}/bayes.ham.sqlite";

spam = false;

}

statfile {

symbol = "BAYES_SPAM";

#path = "${DBDIR}/bayes.spam.sqlite";

spam = true;

}

learn_condition =<<EOD

return function(task, is_spam, is_unlearn)

local prob = task:get_mempool():get_variable('bayes_prob', 'double')

if prob then

local in_class = false

local cl

if is_spam then

cl = 'spam'

in_class = prob >= 0.95

else

cl = 'ham'

in_class = prob <= 0.05

end

if in_class then

return false,string.format('already in class %s; probability %.2f%%',

cl, math.abs((prob - 0.5) * 200.0))

end

return true

end

EOD

.include(try=true; priority=1) "$LOCAL_CONFDIR/local.d/classifier-bayes.conf"

# .include(try=true; priority=10) "$LOCAL_CONFDIR/override.d/classifier-bayes.conf"

}

#.include(try=true; priority=1) "$LOCAL_CONFDIR/local.d/statistic.conf"

#.include(try=true; priority=10) "$LOCAL_CONFDIR/override.d/statistic.conf"

---

/usr/local/etc/rspamd# cat local.d/classifier-bayes.conf

backend = "redis";

new_schema = true;

Manuel Garbin

unread,

Sep 7, 2018, 11:48:18 AM9/7/18

to rspamd

You have to use rspamc learn_spam or rspamc learn_ham command ....

If you have different bayes classifier :

rspamc -c bayes_global learn_spam ...
rspamc -c bayes_user -d us...@example.com learn_spam

Alexander Moisseev

unread,

Sep 7, 2018, 2:39:32 PM9/7/18

to rsp...@googlegroups.com

On 07.09.18 17:53, Anton Onischenko wrote:
> Hello Everyone!
> I installed the new version of rspamd 1.8.0
> But I could not make it autolearn.

Autolearn doesn't work with rspamc or WebUI scans. You need to send email.

>
> What is wrong with my configs?
>

It should work, but changing of stock configuration files is not recommended. You should create configuration files in local.d/ or override.d/ .

If you need just one classifier, create local.d/classifier-bayes.conf:

backend = "redis";
servers = "localhost:6379";
new_schema = true;
autolearn = [-5, 7];

BTW, It is not advisable to use autolearn as it learns false positives as well. Especially with such low thresholds.

Autolearn could be useful for learning rejected messages. IMO relatively safe thresholds for that are [-20, 20] (ham will never hit the '-20' threshold, so it could be -9999 as well).

Anton Onischenko

unread,

Sep 10, 2018, 3:46:06 AM9/10/18

to rspamd

Thanks for the clarification.

пятница, 7 сентября 2018 г., 17:53:54 UTC+3 пользователь Anton Onischenko написал:

Reply all

Reply to author

Forward