Cleanfeed EMP filters

58 views
Skip to first unread message

Steve

unread,
Dec 19, 2007, 9:54:07 AM12/19/07
to
Hi,

I've produced some updates to Cleanfeed to address the current sporge
type floods that are currently attacking a number of groups. In
accordance with the license I'm publishing these updates and making them
available should anyone else wish to use them.

http://www.mixmin.net/cleanfeed
http://www.mixmin.net/cleanfeed.diff
The diff is based on the last issued cleanfeed-20020501.

The update introduces two new EMP filters:
PHN NNTP-Posting-Host/Newsgroups
PHR NNTP-Posting-Host (High-risk groups)

The role of the PHR filter is to solve the issue with PHN where an
abuser circumvents it by cross-posting to the target group plus one other
random group so as to avoid hash collisions.
Eg:-
Newsgroups: sci.crypt,alt.foo
Newsgroups: sci.crypt,alt.bar

Both filters include an aggressive switch which will enable them to use
a component of the Path header in the absence of an NNTP-Posting-Host
header. Using this mode in the PHN filter may be too aggressive for
many feeds unless care is taken to exempt groups and posting-hosts.
This is especially true if the news server propagates mailing lists
which all have the same Posting Host.

Configuration parameters are:
phn_exempt NNTP-Posting-Hosts regex's to exempt from the PHN filter
phr_exempt NNTP-Posting-Hosts regex's to exempt from the PHR filter
phn_aggressive Allow PHN filter to use Path elements (Here be dragons)
phr_aggressive Allow PHR filter to use Path elements
phn_exclude Newsgroup regex's to exclude from the PHN filter
flood_groups Regex's of groups to include in the PHR filter.

All of the above parameters should be defined in the cleanfeed.local
file but for my purposes during development they are configured within
cleanfeed itself.

In addition to the above two EMP filters, I've integrated a patch
written and published by Jeffrey Vinocur to allow filtering of MIME
encapsulated HTML.

There's also support for a bad_groups file that works the same as
bad_hosts and the likes. It just provides a quick means to disallow
specific groups should an admin wish to do so.

Lastly, this is all very much work in progress. If anyone cares to view
or contribute to the code then I'm happy to grant Subversion access.

Steve

--
pub 1024D/228761E7 2003-06-04 Steven Crook
Key fingerprint = 1CD9 95E1 E9CE 80D6 C885 B7EB B471 80D5 2287 61E7
uid Steven Crook <st...@mixmin.net>

Julien ÉLIE

unread,
Dec 19, 2007, 2:58:13 PM12/19/07
to
Hi Steve,

> I've produced some updates to Cleanfeed to address the current sporge
> type floods that are currently attacking a number of groups. In
> accordance with the license I'm publishing these updates and making them
> available should anyone else wish to use them.

That's great. Thanks!


> There's also support for a bad_groups file that works the same as
> bad_hosts and the likes. It just provides a quick means to disallow
> specific groups should an admin wish to do so.

What would be very great is to have a place where newsmasters can *always*
find the last version of the bad_* files. It would indeed be very useful
to better filter spam.


As for other patches, you might perhaps want to add this one:

@@ -616,7 +613,10 @@
[Tt][Ee]?[Xx][Tt]|
[Hh][Tt][Mm][Ll]?|
[Ee][Xx][Ee]|
- [Uu][Rr][Ll]
+ [Uu][Rr][Ll]|
+ [Jj][Pp][Ee]?[Gg]|
+ [Gg][Ii][Ff]|
+ [Pp][Nn][Gg]
)
\s+ # end of line
(?:

--
Julien ÉLIE

« -- Dis, je crois avoir entendu parler gothique par là !
-- Tu as des visions, Pamplemus ! » (Astérix)

Julien ÉLIE

unread,
Dec 19, 2007, 2:59:11 PM12/19/07
to
>> There's also support for a bad_groups file that works the same as
>> bad_hosts and the likes. It just provides a quick means to disallow
>> specific groups should an admin wish to do so.
>
> What would be very great is to have a place where newsmasters can *always*
> find the last version of the bad_* files. It would indeed be very useful
> to better filter spam.


Or another better idea from Xavier:

----- Message d'origine -----
De : "Xavier Roche" <xro...@free.fr.NOSPAM.invalid>
Message-ID: <fhf9vd$uq9$5...@news.httrack.net>
Groupes de discussion : news.admin.net-abuse.policy
Envoyé : mercredi 14 novembre 2007 18:06
Objet : Re: [Usenet] new filter, which bad paths

> Tim Skirvin a écrit :
>> I would prefer that there was some automated source to trust on
>> this kind of thing, with a signed cleanfeed configuration file that I
>> could download hourly/nightly or whatever.
>
> Or a set of rules, that could be merged by any server, with an expiration time, sent regularly through a mechanism similar to
> NoCeM ?
>
> Something like:
>
> notice acbd18db
> type hipcrime_flood
> expires 180
> comment "hipcrime flood"
> action hide
> rule x-complaints-to: ^abuse@rr\.com$
> path: !roadrunner\.com!
> rule nntp-posting-host: (75\.184\.99\.32|72\.190\.76\.22)
>
> And, if by chance, someone is at last doing something, we could think of a "remove acbd18db" message.
>
> [ In the meantime, most flood articles were advertised through NoCeM notices ]

William Kronert

unread,
Dec 19, 2007, 4:00:18 PM12/19/07
to
Julien ?LIE <iul...@nom-de-mon-site.com.invalid> wrote:
> Hi Steve,

>> I've produced some updates to Cleanfeed to address the current sporge
>> type floods that are currently attacking a number of groups. In
>> accordance with the license I'm publishing these updates and making them
>> available should anyone else wish to use them.

> That's great. Thanks!

I have been using Steve's new filters for several weeks now and can
honestly say...they really do a great job.

As Steve pointed out care should be taken when using the phn EMP
filter. You will need to tweak the exempt list.

The phr (high rish) EMP filter I added in last week. Its the best thing
yet, really works like a charm. I haven't had one false positive from
the phr filter and very quickly and effectively stops the floods before
you can even notice them.

The two filters complement each other very nicely and offer a flexable
solution for filtering out these floods.

Thanks Steve.

> What would be very great is to have a place where newsmasters can *always*
> find the last version of the bad_* files. It would indeed be very useful
> to better filter spam.

Agreed along with a ongoing update of cleanfeed in general.

Bill

Michael Grimm

unread,
Dec 24, 2007, 8:50:41 AM12/24/07
to
Steve <st...@mixmin.net> wrote:

> I've produced some updates to Cleanfeed to address the current sporge
> type floods that are currently attacking a number of groups.

Thanks a lot, and it's highly appreciated!

> Configuration parameters are:
> phn_exempt NNTP-Posting-Hosts regex's to exempt from the PHN filter
> phr_exempt NNTP-Posting-Hosts regex's to exempt from the PHR filter
> phn_aggressive Allow PHN filter to use Path elements (Here be dragons)
> phr_aggressive Allow PHR filter to use Path elements
> phn_exclude Newsgroup regex's to exclude from the PHN filter
> flood_groups Regex's of groups to include in the PHR filter.
>
> All of the above parameters should be defined in the cleanfeed.local
> file

I do assume that phn_exempt, phr_exempt, phn_exclude, and flood_groups
should be defined in cleanfeed.local's %config_append section? If so,
then they need to become appended in cleanfeed by:

#v+

--- cleanfeed.new Mon Dec 24 13:43:46 2007
+++ cleanfeed.old Mon Dec 24 13:40:42 2007
@@ -252,7 +252,6 @@
foreach (qw(bin_allowed bad_bin md5exclude poison_groups
allexclude html_allowed mime_html_allowed low_xpost_groups no_cancel_groups
baddomainpat phl_exempt supersedes_exempt
- phn_exempt phr_exempt phn_exclude flood_groups
refuse_messageids net_abuse_groups spam_report_groups
adult_groups not_adult_groups faq_groups badguys)) {
if (defined $config_append{$_}) {

#v-

I did use your new version out of the box for a couple of days now and
found some articles rejected that I cannot understand why they were
rejected, e.g. (all from de.talk.misc):

<fkjipu$drn$9...@online.de> 437 EMP (phn nph)
<5uvv35-...@sansui.nixdos.de> 437 EMP (phn nph)

<MPG.21d76ffa9...@news.t-online.de> 437 EMP (phn path)

I do have to admit that my understanding of your new PHN filter is
rather poor. Perhaps someone can explain why these postings triggered
your filter. Thanks.

I did add '\.talk' to phn_exclude and changed both aggressive settings
to 0 in the meantime.

Gruß,
Michael
--
to let

Steve

unread,
Dec 24, 2007, 1:03:53 PM12/24/07
to
On Mon, 24 Dec 2007 13:50:41 +0000 (UTC), Michael Grimm wrote in
Message-Id: <fkodfh$t6n$1...@odo.in-berlin.de>:

> I do assume that phn_exempt, phr_exempt, phn_exclude, and flood_groups
> should be defined in cleanfeed.local's %config_append section? If so,
> then they need to become appended in cleanfeed by:

Nice catch thanks! I've applied your patch and posted an update.

> I did use your new version out of the box for a couple of days now and
> found some articles rejected that I cannot understand why they were
> rejected, e.g. (all from de.talk.misc):

I also see a large number of rejects for this group. The cause appears
to be that a small group of posters use the group as a chat medium and post
high volumes of two or three word messages in rapid response to each other.
The filter assumes that no genuine poster is going to post more than 40
messages per hour and so it rejects their messages when they exceed this
threshold.

> I do have to admit that my understanding of your new PHN filter is
> rather poor. Perhaps someone can explain why these postings triggered
> your filter. Thanks.

Hopefully I've explained it in part above. The PHN filter works by
creating a hash of the NNTP-Posting-Host and Newsgroups headers. This
means that each time a post comes from a single source to a newsgroup,
the a counter for that combination is incremented. When a threshold is
hit, no more postings from that host are allowed to that newsgroup.

Some postings don't have a NNTP-Posting-Host header. If the aggressive
mode is switched on then these messages will be hashed on the Newsgroups
and an element of the Path header. This is aggressive because it limits
messages from a specific News service to a Newsgroup.

> I did add '\.talk' to phn_exclude and changed both aggressive settings
> to 0 in the meantime.

That's probably a good compromise. I'd suggest leaving the
phr_aggressive switched on though. It only works if you define specific
groups (flood_groups) that you want it to act on.

Michael Grimm

unread,
Dec 25, 2007, 12:12:52 PM12/25/07
to
Steve <st...@mixmin.net> wrote:
> On Mon, 24 Dec 2007 13:50:41 +0000 (UTC), Michael Grimm wrote in Message-Id: <fkodfh$t6n$1...@odo.in-berlin.de>:

>> I did use your new version out of the box for a couple of days now
>> and found some articles rejected that I cannot understand why they
>> were rejected, e.g. (all from de.talk.misc):
>
> I also see a large number of rejects for this group. The cause
> appears to be that a small group of posters use the group as a chat
> medium and post high volumes of two or three word messages in rapid
> response to each other. The filter assumes that no genuine poster is
> going to post more than 40 messages per hour and so it rejects their
> messages when they exceed this threshold.

Hmm, what happens if 1) someone produces say 50 responses to a given
newsgroup offline and sends them as a single batch, and 2) if one
receives articles via UUCP batches polled only every couple of hours?
That might lead to unwanted article rejections, right?

Ok, these situations are not very common use any longer, although
scenario 2) applies for myself. So, I will have to tweak RateCutoff
and/or RateBaseInterval settings of cleanfeed's filters I apply, or
I will have to increase my polling intervals significantly ;-)

>> I do have to admit that my understanding of your new PHN filter is
>> rather poor. Perhaps someone can explain why these postings triggered
>> your filter. Thanks.
>
> Hopefully I've explained it in part above. The PHN filter works by
> creating a hash of the NNTP-Posting-Host and Newsgroups headers. This
> means that each time a post comes from a single source to a newsgroup,
> the a counter for that combination is incremented. When a threshold
> is hit, no more postings from that host are allowed to that newsgroup.

Thanks for your explanations. Now I do hopefully understand the filters
applied ;-)

In the meantime some more postings have been rejected by your default
settings plus '\.talk' for phn_exclude, all showing the same "chatting
nature". Therefore I increased PHNRateCutoff to 80 and PHNRateCeiling to
120 for the time being. The numbers are pure gut feeling, though ;-)

>> I did add '\.talk' to phn_exclude and changed both aggressive
>> settings to 0 in the meantime.
>
> That's probably a good compromise. I'd suggest leaving the
> phr_aggressive switched on though. It only works if you define
> specific groups (flood_groups) that you want it to act on.

Ok, I switched it on again.

Regards,
Michael
--
Everybody deserves the posting leaking his killfile ...

Steve

unread,
Dec 26, 2007, 7:30:50 AM12/26/07
to
On Tue, 25 Dec 2007 17:12:52 +0000 (UTC), Michael Grimm wrote in
Message-Id: <fkrdmk$2a18$1...@odo.in-berlin.de>:

> Hmm, what happens if 1) someone produces say 50 responses to a given
> newsgroup offline and sends them as a single batch, and 2) if one
> receives articles via UUCP batches polled only every couple of hours?
> That might lead to unwanted article rejections, right?

Yes, in these instances if you batch-posted in excess of 40 articles to
a single newsgroup then you would get unwanted rejections. The default
of 40 was based on a few weeks of monitoring the filter and upping the
threshold until I saw very few false positives but I admit it's still a
compromise and probably should be higher. After introducing the PHR
filter I tend to use the PHN more as an alerting tool to identify which
groups are being flooded. A higher threshold would certainly accomplish
this as floods tend to measure in the thousands of articles.

> In the meantime some more postings have been rejected by your default
> settings plus '\.talk' for phn_exclude, all showing the same "chatting
> nature". Therefore I increased PHNRateCutoff to 80 and PHNRateCeiling to
> 120 for the time being. The numbers are pure gut feeling, though ;-)

Sounds reasonable. I'll do some more studying of what's getting caught
and will up the defaults accordingly. I'd prefer to catch too little
rather than too much. :)

Michael Grimm

unread,
Dec 27, 2007, 4:07:23 PM12/27/07
to
Steve <st...@mixmin.net> wrote:
> On Tue, 25 Dec 2007 17:12:52 +0000 (UTC), Michael Grimm wrote in Message-Id: <fkrdmk$2a18$1...@odo.in-berlin.de>:

> After introducing the PHR filter I tend to use the PHN more as an
> alerting tool to identify which groups are being flooded. A higher
> threshold would certainly accomplish this as floods tend to measure in
> the thousands of articles.

Taking these numbers into account, ...

>> Therefore I increased PHNRateCutoff to 80 and PHNRateCeiling to 120
>> for the time being. The numbers are pure gut feeling, though ;-)

... still rejecting a considerable amount of "chatty" articles, ...

> Sounds reasonable. I'll do some more studying of what's getting
> caught and will up the defaults accordingly. I'd prefer to catch too
> little rather than too much. :)

and agreeing with your last sentence, I have increased PHNRateCutoff to
200 and PHNRateCeiling to 250 for the time being. I'll let you know.

Regards,
Michael
--
to let

Michael Grimm

unread,
Jan 4, 2008, 1:47:58 PM1/4/08
to
Michael Grimm <tras...@odo.in-berlin.de> wrote:

> I have increased PHNRateCutoff to 200 and PHNRateCeiling to 250 for
> the time being. I'll let you know.

JFTR: After one week without a single false negative, I did decrease
those numbers to 150 and 200 respectively. Now, I'm occasionally
freceiving alse negatives in comp.lang.ruby (crosslinked with ML
ruby-talk-google?).

Well, I think I will revert back to 200 and 250 for the time being.

Regards,
Michael

Steve

unread,
Jan 5, 2008, 11:23:56 AM1/5/08
to
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On Fri, 4 Jan 2008 18:47:58 +0000 (UTC), Michael Grimm wrote in
Message-Id: <fllv0u$2cn2$2...@odo.in-berlin.de>:

> JFTR: After one week without a single false negative, I did decrease
> those numbers to 150 and 200 respectively. Now, I'm occasionally
> freceiving alse negatives in comp.lang.ruby (crosslinked with ML
> ruby-talk-google?).
>
> Well, I think I will revert back to 200 and 250 for the time being.

Thanks for the feedback Michael.

I've up'd the default values:
PHNRateCutoff => 100
PHNRateCeiling => 150

Whilst not as high as your preferred setting, this is a lot more than
the previous default values of 40/80 respectively. I'll run with these
for a while and see how they work out.

I've also changed the default flood_groups to none. The setting was
good for the currently experienced floods but IMO it should be defined
solely by the operator in cleanfeed.local.

http://www.mixmin.net/cleanfeed
http://www.mixmin.net/cleanfeed.diff
http://www.mixmin.net/cleanfeed.asc (Signature)

Steve

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHf68ctHGA1SKHYecRCpj9AJ9t7AAGRac7GyUg3C2YHOJrqtUpLgCdEyAH
PcfSO7Rgvqst5gtoohp+04g=
=FV9N
-----END PGP SIGNATURE-----

Michael Grimm

unread,
Jan 20, 2008, 11:05:09 AM1/20/08
to
Steve <st...@mixmin.net> wrote:
> On Fri, 4 Jan 2008 18:47:58 +0000 (UTC), Michael Grimm wrote in Message-Id: <fllv0u$2cn2$2...@odo.in-berlin.de>:

>> Well, I think I will revert back to 200 and 250 for the time being.
>

> I've up'd the default values:
> PHNRateCutoff => 100
> PHNRateCeiling => 150

In addition to my values mentioned above, I did simultaneously reduce
PHNRateBaseInterval to 600 seconds because I had the feeling that this
would suite my INN's feeding via UUCP batches better:

'PHNRateCutoff' => 200
'PHNRateCeiling' => 250
'PHNRateBaseInterval' => 600

JFTR: I have had only very few false positives during the last 16 days,
only coming from groups "misused" for chatting. I will keep these
settings for the future.

> I've also changed the default flood_groups to none. The setting was
> good for the currently experienced floods but IMO it should be defined
> solely by the operator in cleanfeed.local.
>
> http://www.mixmin.net/cleanfeed
> http://www.mixmin.net/cleanfeed.diff
> http://www.mixmin.net/cleanfeed.asc (Signature)

Thanks for improving cleanfeed,
Michael
--
to let

Reply all
Reply to author
Forward
0 new messages