My first experiments with simple custom rules all worked fine
/^a\w+$/ rejects junk addressed to a*
/^\d+snz$/ rejects junk addressed to old Snews articles.
But I seem to run into trouble when modifiers are involved.
The regular expression grammar appears to be a subset? of egrep. This
makes me wonder if I can enter unusual characters with escapes eg.
\xa4 or \244 for the character ś which is almost invariably unique to
the subject headers of big5 encoded bulk UCE dross. Or do I want "=A4"
text?
Here is a concrete example where I have a bunch of rules that I think
should have triggered on the following header to file it in "Junk".
The rules are:
/\.tw/f
/tw>$/f
/\.seed\.net\.tw>$/f
/\xa4/s
/=a4/s
/\.txt$/h
/^To:\t.+\.txt,\t/h
I know some of these would risk serious collateral damage if applied in
their present form, but since I couldn't get any of these rules to work
at all I have tried my best to simplify them to avoid any possible
typos.
I think most of these should have triggered and still can't spot my
mistake.
And here is the header from the offending junk big5 encoded UCE:
Return-Path: <UwlLve...@party.seed.net.tw>
Received: from tele-punt-22.mail.demon.net ([194.217.242.7])
by nezumi.demon.co.uk with SMTP id
<sYSZLAAm...@nezumi.demon.co.uk>
for <mar...@nezumi.demon.co.uk> ; Thu, 19 Jun 2003 10:53:42 +0100
Received: from punt-2.mail.demon.net by mailstore for
mar...@nezumi.demon.co.uk
id 1056007911:20:19632:66; Thu, 19 Jun 2003 07:31:51 GMT
Received: from [90.13.30.61.isp.tfn.net.tw] ([61.30.13.90])
by punt-2.mail.demon.net id aa2105173; 19 Jun 2003 7:31 GMT
Received: from pchome
by tcts1.seed.net.tw with SMTP id rNWkg0oDxm8tAfKFk6h;
Thu, 19 Jun 2003 15:31:34 +0800
Message-ID: <qDuy...@kimo.com>
From: en...@grace.com
To: export_4000000.txt, export_2000000.txt, export_3000000.txt
Subject:=?big5?Q?=A1u=A4G=B2=B4=A4=D1=AF]=A1v-
=BCW=AA=F8=AB=C4=A4l=B4=BC=BCz=AA=BA=A8=AD=A4=DF=C6F=B1K=C4_?=
MIME-Version: 1.0
Content-Type: multipart/related;
type="multipart/alternative";
boundary="----=_NextPart_Zix89s9iEcfC2Iws1XSqOEhTxHfD"
X-Mailer: B2ScVIxiqPyJyc4CQ
X-Priority: 3
X-MSMail-Priority: Normal
[snip]
I expect I could bounce this on envelope by entering a simple rule on
the reverse path to zap everything from *.seed.net.tw (*.tw is
tempting).
However, I would very much like to know what I am doing wrong with the
TP custom rule syntax that prevents me from writing an equivalent.
Thanks for any enlightenment.
Regards,
--
Martin Brown
> /\.tw/f
> /tw>$/f
> /\.seed\.net\.tw>$/f
> /\xa4/s
> /=a4/s
> /\.txt$/h
> /^To:\t.+\.txt,\t/h
Regardless of any other complications in your approach, most of these
will not work and you should get an error message to say so - illegal
modifier if you try to use them in envelope rejection.
Look at the TP help for what is included in the envelope - the forward
path and the reverse path.
There is no subject and headers are contained in the message itself, not
the envelope.
The only headers that are allowed are:
/f The reverse path (as in the SMTP envelope - it may be the same
as the From: header but need not be)
/T The forward path (as in the SMTP envelope which is not the same
as the To: header or indeed any header and may not even be
contained in the message at all).
/u The local part of the forward path - the bit before the @.
The complete table of modifiers and the places in which they can be used
is given in the help page entitled Custom Rules - modifiers.
It may help if you think of the envelope of, well, for want of a better
word, an envelope. Postmen deliver letters to the address they can see
on (or through) the envelope. They don't open the envelope and take an
address from what is written on the paper inside it - and that might
have many more addresses than yours and may not even have yours at all
if it merely a copy of a letter or even put in the wrong envelope. The
email system delivers in the same way - using the envelope, not the
contents of the message.
As to the syntax and what you may use it is fully defined in the help
pages. If something is not mentioned there it is very likely that it is
not provided for in TP.
--
John Underwood
The Reply-To: address will remain in use for at least 30 days, you may write to
me using jo...@theunderwoods.org.uk. Mail to the From: address may be
rejected or a complaint raised about it as UCE/Spam.
The above should all be valid envelope rejection rules (but apparently
did nothing at all when tested routing to "junk").
>> /\xa4/s
>> /=a4/s
>> /\.txt$/h
>> /^To:\t.+\.txt,\t/h
These are in the hope of diverting bulk UCE that gets through into a
directory called "junk". Again none of them were triggered.
The title of this thread may be misleading - the problem I have seems to
be that custom rules seem to work for me provided they do not use
modifiers.
>
>Regardless of any other complications in your approach, most of these
>will not work and you should get an error message to say so - illegal
>modifier if you try to use them in envelope rejection.
They were being tested routing to junk. I listed all the rules I thought
should or might have been triggered by that header. Some of them had
been extremely simplified to try and ensure they would trigger.
>
>Look at the TP help for what is included in the envelope - the forward
>path and the reverse path.
>There is no subject and headers are contained in the message itself, not
>the envelope.
Yes. I realise that. But I still don't see why none of these rules
triggered.
.seed.net.tw> *was* in the reverse path.
>The only headers that are allowed are:
>
>/f The reverse path (as in the SMTP envelope - it may be the same
> as the From: header but need not be)
>
>/T The forward path (as in the SMTP envelope which is not the same
> as the To: header or indeed any header and may not even be
> contained in the message at all).
>
>/u The local part of the forward path - the bit before the @.
>
>The complete table of modifiers and the places in which they can be used
>is given in the help page entitled Custom Rules - modifiers.
I understand this. My point was that I appear to have a whole set of
rules that are not having any effect on routing inbound mail into my
junk directory. Other rules have previously worked as expected.
>As to the syntax and what you may use it is fully defined in the help
>pages. If something is not mentioned there it is very likely that it is
>not provided for in TP.
Fair enough. So how do I match for a character with top bit set ?
Perhaps the other question I should have asked is how do I select
against mail with a subject containing a known top bit set character?
TP displays such characters as their ASCII escape code =A4, but how do I
make a custom rule to match it ?
Regards,
--
Martin Brown
>
>Fair enough. So how do I match for a character with top bit set ?
Your title was not just misleading, it put the discussion into the wrong
universe of discourse.
However, even on other aspects of the matters to which routeing and
rejection can be applied, in most cases, you can't deal with the top
bit, the headers use 7-bit ASCII so they don't have the top bit set.
I think there are large areas here which it would be helpful if you
could sort out in your own mind before asking the question. I am not
saying you need to know all the answers, just try and ask questions
without preconditioning the area in which the answer is to be given.
For example, you say you understand the table of allowable modifiers,
but then ask questions which suggest that either you don't or that the
question relates to something completely different.
Or, to put it another way, I give up, I haven't a clue what your problem
is.
[snip]
>The rules are:
>
> /\.tw/f
> /tw>$/f
> /\.seed\.net\.tw>$/f
These three do not work because the reverse path is "en...@grace.com"
> /\xa4/s
This doesn't work because this escape sequence is not part of the TP
regular expressions
> /=a4/s
I am not sure why this one doesn't work: I am tempted to say that it is
because the case is wrong, but the help suggests that to make case
significant you should add a 'c' after the 's'. It might be worth trying
/=A4/s
and/or
/=A4/sc
If these don't work I think the subject must be converted to 8bit before
testing and you would have to insert the 8bit character itself in your
rule.
> /\.txt$/h
This looks as if it should work with the headers below. I am at a loss
unless TP is somehow getting the line-wrapping logic to attach the
Subject line to the To: line. This might be an alternative explanation
for why the =a4 rule didn't work.
> /^To:\t.+\.txt,\t/h
\t represents a tab; the character you want to match is a space.
I would also be interested to know the answer to this problem.
--
John Lowe
jr...@bytetype.co.uk
A preliminary question to Martin, are you sure that your "junk" folder
is high enough up the folder routeing list ?
An email will go into the *first* folder with a matching rule. Each time
you add a folder, it goes almost to the bottom of the list (just above
Inbox which must be the last on the list IIRC) - you can then move it up
or down by selecting Configure / Folder Routeing
>
>>The rules are:
>>
>> /\.tw/f
If the email got as far as this rule, I would have expected it to work
>> /tw>$/f
>> /\.seed\.net\.tw>$/f
>
>These three do not work because the reverse path is "en...@grace.com"
No, that was the From: header, not the reverse path
I would be reluctant to use the angle bracket as the Help File says that
this is taken from the MAIL FROM: command in the SMTP dialogue and there
is no absolute guarantee that the mail server will have included the
angle brackets - they are not mandatory.
>
[Snip]
>
>> /=a4/s
>
>I am not sure why this one doesn't work
Neither am I, but it might be worth escaping the equal sign as another
test
>
>> /\.txt$/h
>
>This looks as if it should work with the headers below.
Not at all. There is *no* header in Martin's sample that *ends* with the
string .txt
HTH
--
Robert
This information provided free of charge for those willing to accept
it. Others who wish to be spoon-fed may acquire my services at the
discounted rate of 80 GB Pounds per hour or part thereof.
OK
>
>I would be reluctant to use the angle bracket as the Help File says
>that this is taken from the MAIL FROM: command in the SMTP dialogue and
>there is no absolute guarantee that the mail server will have included
>the angle brackets - they are not mandatory.
>>
>[Snip]
>>
>>> /=a4/s
>>
>>I am not sure why this one doesn't work
>
>Neither am I, but it might be worth escaping the equal sign as another
>test
>>
>>> /\.txt$/h
>>
>>This looks as if it should work with the headers below.
>
>Not at all. There is *no* header in Martin's sample that *ends* with
>the string .txt
Not even this one?
To: export_4000000.txt, export_2000000.txt, export_3000000.txt
>
>HTH
--
John Lowe
jr...@bytetype.co.uk
Thank you Robert. You have got it in one. I am very new to TP.
I had assumed naively that the order of rule search was the same as the
order in which the folders appear in the TP browse window. Junk came
before Martin there, but alas not in the folder routing list.
The other custom rules that had all behaved exactly as expected worked
perfectly because the addressee was not martin.
>An email will go into the *first* folder with a matching rule. Each
>time you add a folder, it goes almost to the bottom of the list (just
>above Inbox which must be the last on the list IIRC) - you can then
>move it up or down by selecting Configure / Folder Routeing
>>
>>>The rules are:
>>>
>>> /\.tw/f
>
>If the email got as far as this rule, I would have expected it to work
I suspect now that it would. I will try it out against hinet.net
It is coming up to the weekend so plenty of bulk UCE to test it on.
Ironically it looks like my testing method was the root cause of my
confusion - had I put the original regular expression into place acting
on the envelope untested it would almost certainly have worked first
time.
>
>>> /tw>$/f
>>> /\.seed\.net\.tw>$/f
>>
>>These three do not work because the reverse path is "en...@grace.com"
>
>No, that was the From: header, not the reverse path
>
>I would be reluctant to use the angle bracket as the Help File says
>that this is taken from the MAIL FROM: command in the SMTP dialogue and
>there is no absolute guarantee that the mail server will have included
>the angle brackets - they are not mandatory.
I had tried all permutations. With a bit of luck now that the rules get
a chance of being triggered I will be able to make some progress.
>HTH
It did. I was sure I must be overlooking something blindingly obvious.
Regards,
--
Martin Brown
The main thrust was why don't these regular expressions work given the
following header example which it appears should trigger most of them.
"How to escape a top bit set character" into a rule was an aside.
>
>However, even on other aspects of the matters to which routeing and
>rejection can be applied, in most cases, you can't deal with the top
>bit, the headers use 7-bit ASCII so they don't have the top bit set.
Odd then that through Demon using TP I can receive a header containing a
Subject line exactly like this -
Subject: ±z›Ýn¥~Äy¤k¶Ä¶Ü?(¶V.μá.®õ)
NB various top bit set characters above may display oddly.
Note that in the above subject header there are no less than 3 \xa4
characters.
This was cut & pasted directly from TP displayed headers so it displays
them exactly as is and proves that the mail transport system is
preserving 8 bit characters unmolested on some roots. Other mail from
the Far East comes in with ASCII escape sequences in the headers.
>Or, to put it another way, I give up, I haven't a clue what your
>problem is.
That seems clear enough.
Regards,
--
Martin Brown
I had also made the same assumption, which should solve a problem I had
with email routeing. Thanks Robert.
However I have another problem with the folder routeing window; that of
about ten folders, only three have names. I can recognise them by their
rule sets, but it seems odd to have folders with no names. Is it
something to do with how I have set them up - or a bug (heaven forfend)
- or what...
--
John Lowe
jr...@bytetype.co.uk