autolearn spam and ham for redis

853 views
Skip to first unread message

Anton Onischenko

unread,
Sep 7, 2018, 10:53:54 AM9/7/18
to rspamd
Hello Everyone!
I installed the new version of rspamd 1.8.0 
But I could not make it autolearn.
/usr/local/etc/rspamd# rspamc stat
Results for command: stat (0.054 seconds)
Messages scanned: 51
Messages learned: 0
Fuzzy hashes in storage "rspamd.com": 150303023
Fuzzy hashes stored: 150303023
Statfile: BAYES_SPAM type: redis; length: 0; free blocks: 0; total blocks: 0; free: 0.00%; learned: 1884; users: 1; languages: 0
Statfile: BAYES_HAM type: redis; length: 0; free blocks: 0; total blocks: 0; free: 0.00%; learned: 1582; users: 1; languages: 0
Total learns: 3466
----

I tryed to check my email :
/usr/local/etc/rspamd# rspamc -p -t 7  -F "w...@mail.ru" -r "ww...@mail.ru" -i "1.1.1.1" --hostname "mail.ru" --helo "mail.ru" < /var/qmail/new/05363273382341361400
Results for file: stdin (0.540 seconds)
[Metric: default]
Action: reject
Spam: true
Score: 13.12 / 8.00
Symbol: ARC_NA (0.00)
Symbol: ASN (0.00)[asn:13335, ipnet:1.1.1.0/24, country:US]
Symbol: DKIM_TRACE (0.00)[]
Symbol: DMARC_POLICY_REJECT (2.00)[: No valid SPF, No valid DKIM, reject]
Symbol: ENVFROM_SERVICE_ACCT (1.00)
Symbol: FORGED_RECIPIENTS (2.00)[ ww...@mail.ru]
Symbol: FORGED_SENDER (0.30)[ w...@mail.ru]
Symbol: FREEMAIL_ENVFROM (0.00)[mail.ru]
Symbol: FREEMAIL_ENVRCPT (0.00)[mail.ru]
Symbol: FROM_HAS_DN (0.00)
Symbol: FROM_NEQ_ENVFROM (0.00)[w...@mail.ru]
Symbol: HAS_LIST_UNSUB (-0.01)
Symbol: HTML_SHORT_LINK_IMG_3 (0.50)
Symbol: MANY_INVISIBLE_PARTS (0.40)[5]
Symbol: MID_RHS_NOT_FQDN (0.50)
Symbol: MIME_BASE64_TEXT (0.10)
Symbol: MIME_GOOD (-0.10)[multipart/alternative, text/plain]
Symbol: MY_FREE_MAIL (0.50)[w...@mail.ru]
Symbol: PHISHING (3.93)[verisign->verisigninc]
Symbol: RCPT_COUNT_ONE (0.00)[1]
Symbol: RCVD_COUNT_THREE (0.00)[3]
Symbol: RCVD_NO_TLS_LAST (0.00)
Symbol: REPLYTO_ADDR_EQ_FROM (0.00)
Symbol: R_DKIM_REJECT (1.00)[mail.ru]
Symbol: R_SPF_SOFTFAIL (0.00)[~all]
Symbol: TO_DN_NONE (0.00)
Symbol: URI_COUNT_ODD (1.00)[23]
Message-ID: 198792dc6e924ed7b965e6a98e4d7119@425
Message - spf: (SPF): spf softfail

----
After that, I run again static and see the same count of learned messages.

/usr/local/etc/rspamd# rspamc stat
Results for command: stat (0.048 seconds)
Messages scanned: 52
Messages learned: 0
Connections count: 0
Control connections count: 0
Pools allocated: 28
Pools freed: 0
Bytes allocated: 24.87M
Memory chunks allocated: 204
Shared chunks allocated: 16
Chunks freed: 0
Oversized chunks: 1
Fuzzy hashes in storage "rspamd.com": 150311085
Fuzzy hashes stored: 150311085
Statfile: BAYES_SPAM type: redis; length: 0; free blocks: 0; total blocks: 0; free: 0.00%; learned: 1884; users: 1; languages: 0
Statfile: BAYES_HAM type: redis; length: 0; free blocks: 0; total blocks: 0; free: 0.00%; learned: 1582; users: 1; languages: 0
Total learns: 3466

---
But if I try to force learn_spam 
/usr/local/etc/rspamd# rspamc -p learn_spam< /var/new/05363273382341361400 , 
сount of "message learned" is increased in rspamc stat. 


What is wrong with my configs?





/usr/local/etc/rspamd# cat statistic.conf

classifier "bayes" {
  tokenizer {
    name = "osb";
  }

  name = "common";
  cache {
    #path = "${DBDIR}/learn_cache.sqlite";
    type="redis"
  }
  backend = "redis";
  servers = "127.0.0.1:6379";
  languages_enabled = true;

  min_tokens = 11;
  min_learns = 200;
  autolearn = [-4, 7]

  statfile {
    symbol = "BAYES_HAM";
    #path = "${DBDIR}/bayes.ham.sqlite";
    spam = false;
  }
  statfile {
    symbol = "BAYES_SPAM";
    #path = "${DBDIR}/bayes.spam.sqlite";
    spam = true;
  }
  learn_condition =<<EOD 
  return function(task, is_spam, is_unlearn)
  local prob = task:get_mempool():get_variable('bayes_prob', 'double')

  if prob then
    local in_class = false
    local cl
    if is_spam then
      cl = 'spam'
      in_class = prob >= 0.95
    else
      cl = 'ham'
      in_class = prob <= 0.05
    end

    if in_class then
      return false,string.format('already in class %s; probability %.2f%%',
        cl, math.abs((prob - 0.5) * 200.0))
    end
  end

  return true
end
EOD

  .include(try=true; priority=1) "$LOCAL_CONFDIR/local.d/classifier-bayes.conf"
#  .include(try=true; priority=10) "$LOCAL_CONFDIR/override.d/classifier-bayes.conf"
}

#.include(try=true; priority=1) "$LOCAL_CONFDIR/local.d/statistic.conf"
#.include(try=true; priority=10) "$LOCAL_CONFDIR/override.d/statistic.conf"

---
/usr/local/etc/rspamd# cat local.d/classifier-bayes.conf
backend = "redis";
new_schema = true;







Manuel Garbin

unread,
Sep 7, 2018, 11:48:18 AM9/7/18
to rspamd
You have to use rspamc learn_spam or rspamc learn_ham command ....

If you have different bayes classifier :

rspamc -c bayes_global learn_spam ...
rspamc -c bayes_user -d us...@example.com learn_spam

Alexander Moisseev

unread,
Sep 7, 2018, 2:39:32 PM9/7/18
to rsp...@googlegroups.com
On 07.09.18 17:53, Anton Onischenko wrote:
> Hello Everyone!
> I installed the new version of rspamd 1.8.0
> But I could not make it autolearn.

Autolearn doesn't work with rspamc or WebUI scans. You need to send email.

>
> What is wrong with my configs?
>

It should work, but changing of stock configuration files is not recommended. You should create configuration files in local.d/ or override.d/ .

If you need just one classifier, create local.d/classifier-bayes.conf:

backend = "redis";
servers = "localhost:6379";
new_schema = true;
autolearn = [-5, 7];


BTW, It is not advisable to use autolearn as it learns false positives as well. Especially with such low thresholds.

Autolearn could be useful for learning rejected messages. IMO relatively safe thresholds for that are [-20, 20] (ham will never hit the '-20' threshold, so it could be -9999 as well).


Anton Onischenko

unread,
Sep 10, 2018, 3:46:06 AM9/10/18
to rspamd
Thanks for the clarification.  

пятница, 7 сентября 2018 г., 17:53:54 UTC+3 пользователь Anton Onischenko написал:
Reply all
Reply to author
Forward
0 new messages