Anyone successfully filtering Italian spam?

29 views
Skip to first unread message

Nathaniel Griswold

unread,
Nov 29, 2021, 6:45:56 PM11/29/21
to racket...@googlegroups.com
It’s getting through my filters, neither rspamd or my local client can catch on to it.

Is there a good simple filter?

Nate

William J. Bowman

unread,
Nov 29, 2021, 7:31:32 PM11/29/21
to Nathaniel Griswold, Racket Users
My spamassassin catches most of it, but also sometimes catches real list emails because it has stopped trusting racket-users. Some combination of bayes and txrep plugins are doing the heavy lifting, I think.

--
William J. Bowman
> --
> You received this message because you are subscribed to the Google Groups "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/81262C45-D4BE-45DB-B126-5BD11274299A%40nan.sh.

George Neuner

unread,
Nov 29, 2021, 10:38:02 PM11/29/21
to racket...@googlegroups.com
Something has changed very recently because until this last week I
rarely saw it (even marked AS spam) ... maybe a few times this whole
year.  But in the last week I have gotten one or more copies in my inbox
every day.

I use Thunderbird for email - which has a built-in learning junk filter
- but I assumed my provider was filtering it because - sans a specific
rule - Thunderbird would have just put in the Junk folder. I just wasn't
seeing it.

???
George


Stephen De Gabrielle

unread,
Nov 30, 2021, 6:57:42 AM11/30/21
to George Neuner, racket...@googlegroups.com
I’m using gmail for racket-users, but the normally reliable spam filtering fails - despite numerous attempts to train - it still classifies real mail as spam and the spam as real.
S.


--
You received this message because you are subscribed to the Google Groups "Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
--
----

George Neuner

unread,
Nov 30, 2021, 9:10:46 AM11/30/21
to Stephen De Gabrielle, racket...@googlegroups.com

On 11/30/2021 6:57 AM, Stephen De Gabrielle wrote:
> I’m using gmail for racket-users, but the normally reliable spam
> filtering fails - despite numerous attempts to train - it still
> classifies real mail as spam and the spam as real.
> S.

I'm convinced that Google does that on purpose ... I believe they are
paid to keep users looking in their spam folders.

YMMV,
George

Laurent

unread,
Nov 30, 2021, 9:23:09 AM11/30/21
to George Neuner, Stephen De Gabrielle, racket-users@googlegroups.com List
The last 10 spams have all these words in common:
and many more (which I won't copy here for obvious reasons). 

So you could create a dedicated spam filter that looks for *any* of (not: all of) these words.




--
You received this message because you are subscribed to the Google Groups "Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.

George Neuner

unread,
Nov 30, 2021, 9:41:57 AM11/30/21
to Laurent, racket-users@googlegroups.com List

On 11/30/2021 9:22 AM, Laurent wrote:
> The last 10 spams have all these words in common:
> https://pastebin.com/BB0arV63
> and many more (which I won't copy here for obvious reasons).
>
> So you could create a dedicated spam filter that looks for *any* of
> (not: all of) these words.

Unfortunately, this idiot has been spamming a number of (mostly
programming language) groups for more than 10 years now.  Every other
month or so the subject (and hence the relevant keywords) changes, along
with the poster's name and email address, etc.

I used to read the racket groups through NN via Gmane ... sometimes
still do, but mostly use email now.  A few years ago Gmane changed hands
and got flaky for a while, so I activated email delivery on the racket
group server to be sure I was seeing all the posts. Gmane still carries
racket, but many of the other groups it used to carry are gone.

My NN reader has several filters dedicated to this italian spam, but
(until recently) I didn't need to filter it from my email.

George

Nathaniel W Griswold

unread,
Nov 30, 2021, 9:43:41 AM11/30/21
to George Neuner, Laurent, racket-users@googlegroups.com List
Ok. Well i don't often get Italian emails so maybe i will make a special "Italian" Inbox =)

Nate
> --
> You received this message because you are subscribed to the Google Groups "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/6811d26a-b810-9ecf-f485-85774dfd7471%40comcast.net.

Nathaniel W Griswold

unread,
Dec 1, 2021, 1:30:54 PM12/1/21
to Laurent, George Neuner, Stephen De Gabrielle, racket-users@googlegroups.com List


> On Nov 30, 2021, at 8:22 AM, Laurent <laurent...@gmail.com> wrote:
>
> The last 10 spams have all these words in common:
> https://pastebin.com/BB0arV63
> and many more (which I won't copy here for obvious reasons).
>
> So you could create a dedicated spam filter that looks for *any* of (not: all of) these words.

I am gonna do this, actually, because i can see a few more words that are in pretty much all of the ones that actually get through my barriers. Judging by the google translation, it seems to consistently be a (very narrowly) "coherent" smear on the same people for the same things every time, and you have included some of the proper nouns, too, so this seems to be the way to go. I think it will get everything.

Thanks,

Nate

Nathaniel W Griswold

unread,
Dec 1, 2021, 1:42:22 PM12/1/21
to Laurent, George Neuner, Stephen De Gabrielle, racket-users@googlegroups.com List
It's interesting, i think. It seems to me that the spammer is prefixing all of the proper nouns with hash symbol or something else. Maybe that is the reason the spam filters are having trouble, because they are not tokenizing the actual words, or treat hash tags specially, or something.

I think this person should teach a class on how to spam.

Nate

Nathaniel W Griswold

unread,
Dec 1, 2021, 3:42:01 PM12/1/21
to Laurent, George Neuner, Stephen De Gabrielle, racket-users@googlegroups.com List
>>
>>> On Nov 30, 2021, at 8:22 AM, Laurent <laurent...@gmail.com> wrote:
>>>
>>> The last 10 spams have all these words in common:
>>> https://pastebin.com/BB0arV63
>>> and many more (which I won't copy here for obvious reasons).
>>>
>>> So you could create a dedicated spam filter that looks for *any* of (not: all of) these words.
>>
>> I am gonna do this, actually, because i can see a few more words that are in pretty much all of the ones that actually get through my barriers. Judging by the google translation, it seems to consistently be a (very narrowly) "coherent" smear on the same people for the same things every time, and you have included some of the proper nouns, too, so this seems to be the way to go. I think it will get everything.
>>
>> Thanks,
>>
>> Nate
>>


I am putting my simple sieve script here in case anyone is interested. I think it will get the stragglers, and the things in it, i think i have seen as constant for all the stragglers since i've first been getting it.

Sieve is funny because you have to look at like 5 RFCs to figure anything out from primary sources, and all of the third party non-rfc guides you can find from the search engines seem very limited so you kinda have to use the RFCs.

https://pastebin.com/wxXxBEEc

Nate

George Neuner

unread,
Dec 1, 2021, 5:01:41 PM12/1/21
to Nathaniel W Griswold, racket-users@googlegroups.com List
I've been dealing with this for well over 10 years reading various news
groups.  You need to match on descriptive nouns: criminal, terrorist,
etc.  Over time the names of the people and the descriptions of them
will change.

If you look carefully at the Italian, you'll see the same descriptors
come in multiple forms which change in use due to declension (parts of
speech).  You need to match word stems rather than words:  e.g., crimi
or crimin so you catch all the different forms of criminal .  Because
the descriptors are nouns, you have to match them case insensitive,
because they may or may not be capitalized depending on use.  The recent
spam has been all caps, but that hasn't always been so.

Filtering can be done fairly easily with regex (with my NN reader has)
once you figure out what to look for, but most email filter systems
offer only a simple "contains" filter where you have to supply every
possible capitalization (at least 3: none, all, and first).

Good Luck!
George

Aidan Gauland

unread,
Dec 1, 2021, 11:01:51 PM12/1/21
to racket...@googlegroups.com

I am on Fastmail, which uses SpamAssassin (their methodology is detailed here <https://www.fastmail.help/hc/en-us/articles/360060591413-Spam-filtering>), and after many many months of training it on this loon's spam messages, it is finally mostly filtering correctly.  Very few legitimate messages are getting sent to the spam bin now.  (This is across multiple OSS mailing-lists.)

I hope this information helps somewhat.

Reply all
Reply to author
Forward
0 new messages