Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: False positives in spam filter? (was: Re: portable way to get highest bit set?)

21 views
Skip to first unread message

Michael S

unread,
Oct 15, 2023, 9:35:30 AM10/15/23
to
On Sun, 15 Oct 2023 11:17:15 -0000 (UTC)
Ray Banana <ray...@raybanana.net> wrote:

> * Michael S wrote:
> > Turned out that people that read this group through Eternal
> > September had never seen my answer from ~3 days ago. Right now ES
> > spam filter is rather unpredictable and appear to suffer both from
> > insufficient strictness and from false positives. i2pm2 spam filter
> > is considerably better
> > Anyway, I'd post my reply again below.
>
> PMFJI, just a remark regarding the above comment:
>
> I assume you are referring to the article with Message ID
>
> <20231012191...@yahoo.com>
>
> This article is actually available on E-S. Since you
> posted it via E-S, it wouldn't have been scanned by
> the spam filter, anyway, as this filter only scans
> articles originating from GG and some Highwinds Media
> resellers based on the Injection-Info: header.
>
> If you would like to follow up on this issue,
> please reply either on eternal-september.support or
> to my email address.
>

Please, ignore it.
It was either problem with my newsreader or more likely my own
mishandling of newsreader's UI.
On the same computer with different newsreader I see the post on
Eternal September just fine.
Sorry for misleading complain.

Now, other my complaints are not necessarily wrong.
One example is <4e3940f1-8c6e-4786...@googlegroups.com>
The post can be seen here:
www.novabbs.com/devel/article-flat.php?id=29756&group=comp.lang.c#29756



Ray Banana

unread,
Oct 15, 2023, 10:40:10 AM10/15/23
to

Thus spake Michael S <already...@yahoo.com>

> It was either problem with my newsreader or more likely my own
> mishandling of newsreader's UI.
> On the same computer with different newsreader I see the post on
> Eternal September just fine.
> Sorry for misleading complain.

No problem.

> Now, other my complaints are not necessarily wrong.
> One example is <4e3940f1-8c6e-4786...@googlegroups.com>
> The post can be seen here:
> www.novabbs.com/devel/article-flat.php?id=29756&group=comp.lang.c#29756

Thanks, I'm always grateful for information about false positives.

Oct 12 11:20:31 news innd[1182792]: filter: Sending article <4e3940f1-8c6e-4786...@googlegroups.com> from google-groups.googlegroups.com to SpamAssassin
Oct 12 11:20:33 news spamcheck[2387491]: Accepting article <4e3940f1-8c6e-4786...@googlegroups.com>, score: 2.5

So it wasn't my spamfilter that killed it, as it requires a spam score
> 10 for cancelling an article. However, it is not available on E-S
anymore, so I will check the NoCeMs issued by usenet.ovh.

Thanks again for sharing your observations.

--
Пу́тін — хуйло́
http://www.eternal-september.org

ESE LOKO

unread,
Oct 16, 2023, 11:04:43 PM10/16/23
to

Ray Banana

unread,
Oct 17, 2023, 8:18:24 AM10/17/23
to
Thus spake Mickey <Micke...@gmail.invalid>

> Ray Banana wrote:

>>Thanks, I'm always grateful for information about false positives.

> More false positives, probably the Subject: -
> Message-ID: <23d07321-4332-4454...@googlegroups.com>
> Message-ID: <640604af-e6f4-435b...@googlegroups.com>
> Message-ID: <4523406a-bec9-480d...@googlegroups.com>
> Message-ID: <bb897c0b-53ff-4d45...@googlegroups.com>
> Message-ID: <b6f86a0f-86e3-4782...@googlegroups.com>
> Message-ID: <bdb6c14e-a217-49ab...@googlegroups.com>

Thanks, fixed.

Michael S

unread,
Oct 18, 2023, 5:04:41 AM10/18/23
to

Mark Bourne

unread,
Oct 18, 2023, 4:13:51 PM10/18/23
to
Ray Banana wrote:
> Thanks, I'm always grateful for information about false positives.

In comp.lang.python:

Message-ID: <708b6c8e-2196-4253...@googlegroups.com>

Seems to be missing on Eternal September. On the plus side, the spam in
that group seems to be very much reduced, so thanks for the filtering!

--
Mark.

Chris Pitt Lewis

unread,
Oct 20, 2023, 5:56:55 AM10/20/23
to
On 17/10/2023 13:18, Ray Banana wrote:
> Thus spake Mickey <Micke...@gmail.invalid>
>
>> Ray Banana wrote:
>
>>> Thanks, I'm always grateful for information about false positives.
>

The following look like false positives on soc.genealogy.medieval
yesterday - at least, I downloaded the headers but by the time I came to
read them the articles were said to be expired. I can see from replies
to them that they weren't spam.

<c5b91e02-1060-46a8...@googlegroups.com> (69895)
<fd27b8dc-b6cc-4cc8...@googlegroups.com> (69899)
<1bc9698d-309a-4596...@googlegroups.com> (69900)

These two on 17 and 18 Oct from a regular poster didn't get through at all:
https://groups.google.com/g/soc.genealogy.medieval/c/UDttfNyQ714/m/z-9LJtqxAQAJ
https://groups.google.com/g/soc.genealogy.medieval/c/UDttfNyQ714/m/4mJ2JEExAQAJ

Or, so far, these two from 00.03 and 00.51 this morning:
https://groups.google.com/g/soc.genealogy.medieval/c/HT4rgYVzKlg/m/vZRzXmZQAAAJ
https://groups.google.com/g/soc.genealogy.medieval/c/HT4rgYVzKlg/m/kU-uNwVTAAAJ

and these earlier ones:
https://groups.google.com/g/soc.genealogy.medieval/c/4IffJ6_Ir2M/m/94nDKjVIAgAJ
[5 Oct]
https://groups.google.com/g/soc.genealogy.medieval/c/4IffJ6_Ir2M/m/qwaPmYjNAwAJ
[9 Oct]
https://groups.google.com/g/soc.genealogy.medieval/c/Yo6p9Jf29hA/m/PEM8hwDOAwAJ
[10 Oct]
https://groups.google.com/g/soc.genealogy.medieval/c/Euv6mNIXweI/m/E8qw8q-OAwAJ
[9 Oct]

Thank you very much for all your hard and almost entirely successful
work dealing with the spam.

--
Chris Pitt Lewis

Ray Banana

unread,
Oct 20, 2023, 6:38:04 AM10/20/23
to
Thus spake Chris Pitt Lewis <ch...@cjpl.co.uk>

> The following look like false positives on soc.genealogy.medieval
> yesterday - at least, I downloaded the headers but by the time I came
> to read them the articles were said to be expired. I can see from
> replies to them that they weren't spam.

This behaviour indicates that these articles were removed by NoCeM
messages from other servers, as the E-S filter will immediately reject
spam articles. I will check who sent these NoCeMs and inform them about
the false positives. The E-S spam filter did not find anything spammy
about this article:

X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
news.eternal-september.org
X-Spam-Level: **
X-Spam-Status: No, score=2.5 required=10.0 tests=BAYES_60 autolearn=no
autolearn_force=no version=3.4.6
X-Spam-Languages:
X-Received: by 2002:ac8:670b:0:b0:41c:bf35:b00 with SMTP id e11-20020ac8670b000000b0041cbf350b00mr61810qtp.11.1697736393153;
Thu, 19 Oct 2023 10:26:33 -0700 (PDT)
X-Received: by 2002:a05:6808:1492:b0:3af:75e6:efa8 with SMTP id
e18-20020a056808149200b003af75e6efa8mr1062439oiw.6.1697736392959; Thu, 19 Oct
2023 10:26:32 -0700 (PDT)
Path: ...!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: soc.genealogy.medieval
Date: Thu, 19 Oct 2023 10:26:32 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=85.146.50.204; posting-account=q8Fj4woAAACJy7yyPuOqNB8ojQG6Cd-D
NNTP-Posting-Host: 85.146.50.204
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <c5b91e02-1060-46a8...@googlegroups.com>
Subject: Only UK?

Retro Guy

unread,
Oct 20, 2023, 7:00:19 AM10/20/23
to
On Fri, 20 Oct 2023 12:34:37 +0200
Ray Banana <ray...@raybanana.net> wrote:

> Thus spake Chris Pitt Lewis <ch...@cjpl.co.uk>
>
> > The following look like false positives on soc.genealogy.medieval
> > yesterday - at least, I downloaded the headers but by the time I came
> > to read them the articles were said to be expired. I can see from
> > replies to them that they weren't spam.
>
> This behaviour indicates that these articles were removed by NoCeM
> messages from other servers, as the E-S filter will immediately reject
> spam articles. I will check who sent these NoCeMs and inform them about
> the false positives. The E-S spam filter did not find anything spammy
> about this article:
>
> Message-ID: <c5b91e02-1060-46a8...@googlegroups.com>
> Subject: Only UK?

It looks like my (i2pn.org) spamassassin flagged this message and sent it
in NoCeM:

* 2.3 EMPTY_MESSAGE Message appears to have no textual parts
* 3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100%
* [score: 1.0000]

The message is not empty, so not sure why spamassassin says it is. I will
look into this.

--
Retro Guy

yamo'

unread,
Oct 21, 2023, 4:52:48 AM10/21/23
to
Hi,
Retro Guy a tapoté le 20/10/2023 13:00:
> * 2.3 EMPTY_MESSAGE Message appears to have no textual parts

I had to disable this rule, this rule dos not work for nntp messages (On
Spam Assassin 3.4.6).

score EMPTY_MESSAGE 0.0
describe EMPTY_MESSAGE Bug spam assassin?


--
Stéphane

Apd

unread,
Oct 21, 2023, 5:31:32 AM10/21/23
to
False positives (perhaps because of the number of links contained?)
also missing from paganini, so may not be ES filter.

<08ea6357-ea30-400e...@googlegroups.com>
<0c85d978-c28f-4e6c...@googlegroups.com>
<7e36aecd-73a8-49b9...@googlegroups.com>


Retro Guy

unread,
Oct 21, 2023, 8:00:14 AM10/21/23
to
I did the same in response to the previous false positive.

Thank you for the input! Yes, it does seem to trigger for messages that
do have a body.

--
Retro Guy

Ray Banana

unread,
Oct 21, 2023, 8:17:25 AM10/21/23
to
This is going to be tricky. It's either false positives
or false negatives if I don't count quoted links.

Apd

unread,
Oct 21, 2023, 4:28:41 PM10/21/23
to
"Ray Banana" wrote:
>* Apd wrote:
>> False positives (perhaps because of the number of links contained?)
>> also missing from paganini, so may not be ES filter.
>>
>><08ea6357-ea30-400e...@googlegroups.com>
>><0c85d978-c28f-4e6c...@googlegroups.com>
>><7e36aecd-73a8-49b9...@googlegroups.com>
>
> This is going to be tricky. It's either false positives
> or false negatives if I don't count quoted links.

Blacklist IP ranges used by GG spammers? I don't know how practical
that would be (maybe it's already being done). GG posts always have
the posting-host info.


Ray Banana

unread,
Oct 22, 2023, 10:49:06 AM10/22/23
to
Thus spake Mickey <Micke...@gmail.invalid>

> Ray Banana wrote:

Yet could not find them listed in
> E-S nocems in news.lists.filters. Appears your filters are doing the
> right thing, others? not so much.

> Message-ID: <eb24b888-f40c-4c57...@googlegroups.com>
> Message-ID: <0d9d5512-ecaf-4525...@googlegroups.com>
> Message-ID: <68626ad8-bf05-49d9...@googlegroups.com>
> Message-ID: <c68f677d-c959-448a...@googlegroups.com>
> Message-ID: <25f34c55-ef00-4157...@googlegroups.com>
> Message-ID: <590bd2df-4d65-4683...@googlegroups.com>
> Message-ID: <3219da0f-4f7f-4175...@googlegroups.com>
> Message-ID: <793f1fb7-12e8-40ec...@googlegroups.com>
> Message-ID: <05da5125-4649-4885...@googlegroups.com>
> Message-ID: <ed12cf7d-6bde-445b...@googlegroups.com>
> Message-ID: <e0886151-f220-4778...@googlegroups.com>
> Message-ID: <59b09d24-4f86-4890...@googlegroups.com>
> Message-ID: <04ae0423-5750-45e1...@googlegroups.com>
> Message-ID: <ce8086c8-2a13-44f1...@googlegroups.com>
> Message-ID: <8b277361-f70b-4f52...@googlegroups.com>
> Message-ID: <ef91608d-7b8a-4ca6...@googlegroups.com>

Thanks very much. I believe the admins of i2pn2 and usenet.ovh are
following this thread in this group.

Ray Banana

unread,
Oct 22, 2023, 10:50:01 AM10/22/23
to

Retro Guy

unread,
Oct 22, 2023, 11:35:05 AM10/22/23
to
Yes, I am (i2pn2). I've temporarily disabled nocem to news.lists.filters as
I browse the false positives for clues.

Thanks for letting us know, Mickey.

--
Retro Guy

Ray Banana

unread,
Oct 22, 2023, 12:27:53 PM10/22/23
to
Thus spake Retro Guy <retr...@i2pn2.org>

> Yes, I am (i2pn2). I've temporarily disabled nocem to news.lists.filters as
> I browse the false positives for clues.

OK, that gives me the opportunity to check the false negatives that got
through the E-S spam filter ;-)

Chris Pitt Lewis

unread,
Oct 22, 2023, 12:43:07 PM10/22/23
to

Some more genuine messages to soc.genealogy.medieval on 20 and 21
October which are showing on google groups but which have not got
through on E-S, presumably because they were false positives on
someone's filter. Particularly frustrating as they are among the few
recent messages on the group that are properly on topic rather than
discussing what to do about the spam.

https://groups.google.com/g/soc.genealogy.medieval/c/HT4rgYVzKlg/m/3MNAV4ACAwAJ
https://groups.google.com/g/soc.genealogy.medieval/c/HT4rgYVzKlg/m/o9vJGk4JAwAJ
https://groups.google.com/g/soc.genealogy.medieval/c/HT4rgYVzKlg/m/eh0WDu0PAwAJ
https://groups.google.com/g/soc.genealogy.medieval/c/HT4rgYVzKlg/m/FQecr2QWAwAJ
https://groups.google.com/g/soc.genealogy.medieval/c/HT4rgYVzKlg/m/-0lbMzY0AwAJ

Two later messages at 0611 and 0619 this morning in the same thread did
get through.
--
Chris Pitt Lewis

Chris Pitt Lewis

unread,
Oct 22, 2023, 12:47:34 PM10/22/23
to

Ray Banana

unread,
Oct 22, 2023, 1:11:52 PM10/22/23
to
Thus spake Chris Pitt Lewis <ch...@cjpl.co.uk>

> Some more genuine messages to soc.genealogy.medieval on 20 and 21
> October which are showing on google groups but which have not got
> through on E-S, presumably because they were false positives on
> someone's filter. Particularly frustrating as they are among the few
> recent messages on the group that are properly on topic rather than
> discussing what to do about the spam.

> https://groups.google.com/g/soc.genealogy.medieval/c/HT4rgYVzKlg/m/3MNAV4ACAwAJ
> https://groups.google.com/g/soc.genealogy.medieval/c/HT4rgYVzKlg/m/o9vJGk4JAwAJ
> https://groups.google.com/g/soc.genealogy.medieval/c/HT4rgYVzKlg/m/eh0WDu0PAwAJ
> https://groups.google.com/g/soc.genealogy.medieval/c/HT4rgYVzKlg/m/FQecr2QWAwAJ
> https://groups.google.com/g/soc.genealogy.medieval/c/HT4rgYVzKlg/m/-0lbMzY0AwAJ

Thanks for the pointer. Could you possibly post the Message-ID of these
articles? I dont seem to be able to find it in the GG web interface (I
don't usually use it, so maybe it's my fault).

TIA.

Chris Pitt Lewis

unread,
Oct 22, 2023, 6:27:42 PM10/22/23
to
I cannot find it either. The above details are all I get if I click on
"copy link" from the menu behind the three dots at the top right hand
side which appear if I open the message. Greyed out on that menu is
"show original message". It appears that I need "view member email
addresses permission" for that, and it also appears that I can only have
that if I am the Group owner, a concept which of course does not exist
for Usenet.
If it is any help (probably not) the messages are 3 from Jeff Homes and
2 from JBrand with a subject line "Some further refs. to Rev. John Heart
& family".
I will see if I can get a 14 day free trial at Giganews which may enable
me to find these details (and download the missing articles).
--
Chris Pitt Lewis

Chris Pitt Lewis

unread,
Oct 22, 2023, 7:20:48 PM10/22/23
to
OK, that worked (though Giganews shows all the spam, and I had to wade
through it). Here are the message IDs:
<ad3d918a-5b40-4fca...@googlegroups.com>
<9c0f5eb0-f6cd-4685...@googlegroups.com>
<21efb112-311d-48f0...@googlegroups.com>
<7852509b-415b-4e4a...@googlegroups.com>
<eb5343a9-129f-400f...@googlegroups.com>

Two messages in the same thread from earlier on 20 Oct, also blocked,
mentioned in my original posting:
<ad092218-3ca2-45df...@googlegroups.com>
<ba6b5431-09f5-4318...@googlegroups.com>

And two on 22 Oct from the same poster which got through on E-S (numbers
69917 and 69918):
<ac14d4da-6e83-49d4...@googlegroups.com>
<2edcff4c-a656-45ec...@googlegroups.com>

--
Chris Pitt Lewis

Retro Guy

unread,
Oct 23, 2023, 8:36:55 PM10/23/23
to
On Sun, 22 Oct 2023 08:32:06 -0700
Retro Guy <retr...@i2pn2.org> wrote:

> On Sun, 22 Oct 2023 16:47:38 +0200
> Ray Banana <ray...@raybanana.net> wrote:
>
> > Thus spake Mickey <Micke...@gmail.invalid>
> >
> > > Ray Banana wrote:
> >
> > Yet could not find them listed in
> > > E-S nocems in news.lists.filters. Appears your filters are doing the
> > > right thing, others? not so much.
> >
> > > snip
> >
> > Thanks very much. I believe the admins of i2pn2 and usenet.ovh are
> > following this thread in this group.
>
> Yes, I am (i2pn2). I've temporarily disabled nocem to news.lists.filters as
> I browse the false positives for clues.

I started NoCeM up again yesterday (i2pn2.org), and did a lot more tweaking this
morning. As I monitor what is identified as spam (I have it saved in a .mbox
file and scroll through with mutt), today I'm finding a false positive rate of
0.075%. Consider that there are tens of thousands of messages filtered as spam
per day.

Every false positive I find, I try to work out what I can change to avoid it,
but that percentage is quite good. I'm not sure it will get any better than that.

--
Retro Guy

Madhu

unread,
Oct 29, 2023, 2:05:28 AM10/29/23
to
The missing message in that thread (in a.u.e) seems to be

<760cb4a0-09cb-47b4...@googlegroups.com>

Ref:
|Subject: Re: Near East or Middle East?
|Newsgroups: eternal-september.support
|Date: Fri, 27 Oct 2023 08:22:40 +0530
|Xref: news.eternal-september.org eternal-september.support:16025

0 new messages