Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

pan spam regex

7 views
Skip to first unread message

whileone

unread,
Nov 9, 2009, 10:12:36 AM11/9/09
to
The Pan newsreader (maybe there are better choices) has
a form for filtering out spam (author contains fjrjtrad, set priority to
zero....or subject contains penis, set priotiry to zero)

That form also has a regex feature.
Is there a regex that would match all the odd not-normal-ascii
characters these turkeys like to use (characters that show up
as phone icons, musical notes, etc)

Seebs

unread,
Nov 9, 2009, 10:23:24 AM11/9/09
to
On 2009-11-09, whileone <whil...@somewhere.net> wrote:
> That form also has a regex feature.
> Is there a regex that would match all the odd not-normal-ascii
> characters these turkeys like to use (characters that show up
> as phone icons, musical notes, etc)

Maybe!

You might try [^[:graph:]], although I honestly can't say whether
this works or not. Quick messing around with other programs suggests
that something to that effect might work; I also got interesting results
from [^ -~].

-s
--
Copyright 2009, all wrongs reversed. Peter Seebach / usenet...@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!

Rikishi42

unread,
Nov 9, 2009, 5:41:02 PM11/9/09
to
On 2009-11-09, whileone <whil...@somewhere.net> wrote:

You could also exclude anything with
User-Agent: G2/1.0
in the header.

Gets most junk.

--
Any time things appear to be going better, you have overlooked
something.

stan

unread,
Nov 9, 2009, 6:23:35 PM11/9/09
to
Seebs wrote:
> On 2009-11-09, whileone <whil...@somewhere.net> wrote:
>> That form also has a regex feature.
>> Is there a regex that would match all the odd not-normal-ascii
>> characters these turkeys like to use (characters that show up
>> as phone icons, musical notes, etc)
>
> Maybe!
>
> You might try [^[:graph:]], although I honestly can't say whether
> this works or not. Quick messing around with other programs suggests
> that something to that effect might work; I also got interesting results
> from [^ -~].

It's terribly hard to idiotproof things; idiots are clever and cunning
:)

To me it seems easier to go the other way and simply accept alphanum,
maybe [[alphanum]] or something. I know next to nothing about i18n,
locale's, or unicode issues but it seems alphanum would be in the
ballpark for many/most environments.


0 new messages